Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bundle rework #1121

Draft
wants to merge 67 commits into
base: master
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from 15 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
f84252d
Rewrote CachingBundle as a universal wrapper
rm-dr Nov 4, 2023
7f24467
Added http retries & sleep
rm-dr Nov 4, 2023
2f5e1e5
Minor cleanup
rm-dr Nov 4, 2023
52c1ba2
Minor fixes
rm-dr Nov 4, 2023
6d4b25a
Fixed reader
rm-dr Nov 5, 2023
eea0a85
Added index to cache
rm-dr Nov 5, 2023
827febb
Minor cleanup
rm-dr Nov 5, 2023
96027f4
Fixed all_files()
rm-dr Nov 5, 2023
c59d394
Fixed print order
rm-dr Nov 5, 2023
e95371a
Minor edit
rm-dr Nov 5, 2023
6a29531
Cleaned up `bundle search`
rm-dr Nov 25, 2023
4fdf435
Rearranged printing
rm-dr Nov 25, 2023
f9f373c
Converted zip bundle to new format
rm-dr Nov 25, 2023
020a6a9
Improved bundle type detection
rm-dr Nov 25, 2023
6b17609
Merge new commits from 'master'
rm-dr Nov 25, 2023
ca1ad16
Minor edits
rm-dr Nov 25, 2023
ed329f4
Improved search method
rm-dr Dec 1, 2023
3219374
Cleaned up cache dir code
rm-dr Dec 2, 2023
cebcce9
Minor cleanup
rm-dr Dec 3, 2023
b938dba
Improved bundle detection
rm-dr Dec 3, 2023
6a99e50
Documentation
rm-dr Dec 3, 2023
bc4c46c
Minor fix
rm-dr Dec 3, 2023
5c952f5
Reverted zip bundle changes
rm-dr Dec 3, 2023
707927a
Minor fix
rm-dr Dec 3, 2023
cb25f8b
Added ttbv1 bundles
rm-dr Dec 3, 2023
0e4f272
Added fill_index_external
rm-dr Dec 3, 2023
78262fe
Added file size check
rm-dr Dec 8, 2023
9adccff
moved get_digest impl for dir and zip
rm-dr Dec 8, 2023
a58a3cb
Reworked bundle caching
rm-dr Dec 9, 2023
ed339e9
Removed resolve_url
rm-dr Dec 9, 2023
a759d31
Removed statusbackend where it isn't needed
rm-dr Dec 10, 2023
fd5f8b8
More minor cleanup
rm-dr Dec 10, 2023
be27209
Cleanup: cache paths
rm-dr Dec 10, 2023
d730a5e
Typo
rm-dr Dec 10, 2023
2498fd9
Cleanup, added download log
rm-dr Dec 10, 2023
4489de4
Minor cleanup
rm-dr Dec 11, 2023
5e7b153
Added retries and real_len
rm-dr Dec 11, 2023
8e49106
Added support for configurable search orders
rm-dr Dec 11, 2023
e18574f
Cleanup
rm-dr Dec 11, 2023
dcfa379
Minor cleanup
rm-dr Dec 11, 2023
bab87db
Minor cleanup
rm-dr Dec 11, 2023
ca5c2ac
Minor cleanup
rm-dr Dec 11, 2023
2cc7a51
Docs
rm-dr Dec 11, 2023
5d51068
Minor fix
rm-dr Dec 11, 2023
a738bcb
Handle absolute paths
rm-dr Dec 11, 2023
e93bf9a
Minor fixes
rm-dr Dec 11, 2023
bdd92cc
Fixed itar caching
rm-dr Dec 11, 2023
df7b451
Minor cleanup
rm-dr Dec 15, 2023
fcbda56
Comment
rm-dr Dec 15, 2023
31bc4be
Fixed links for autodetect function
rm-dr Dec 15, 2023
8de7864
Merged latest changes from "master"
rm-dr Dec 15, 2023
ba56a6c
Added default url redirect resolution
rm-dr Dec 19, 2023
7e60f3f
Harfbuzz bump
rm-dr Feb 10, 2024
80a09fa
Merge branch 'master' into betterbundle
rm-dr Feb 10, 2024
df80e8e
Clippy
rm-dr Feb 10, 2024
cf09067
Renames for process safety
rm-dr Feb 10, 2024
be1647a
Minor edits
rm-dr Feb 10, 2024
0f9721c
Fixed tests
rm-dr Feb 10, 2024
742d2f7
Clippy
rm-dr Feb 10, 2024
8818b60
Clippy
rm-dr Feb 11, 2024
4ed3b22
Merge branch 'master' into betterbundle
rm-dr Feb 21, 2024
d2a3490
Merge branch 'master' into betterbundle
rm-dr Feb 28, 2024
d11a8cf
Argument edits
rm-dr Feb 29, 2024
5625602
Better errors & minor cleanup
rm-dr Feb 29, 2024
2dfa321
Bundle path tweaks
rm-dr Feb 29, 2024
53828f5
Clippy
rm-dr Feb 29, 2024
4d3f8b6
Merge branch 'master' into betterbundle
rm-dr Feb 29, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
783 changes: 314 additions & 469 deletions crates/bundles/src/cache.rs

Large diffs are not rendered by default.

267 changes: 139 additions & 128 deletions crates/bundles/src/itar.rs
Original file line number Diff line number Diff line change
Expand Up @@ -3,198 +3,209 @@

//! The web-friendly "indexed tar" bundle backend.
//!
//! The main type offered by this module is the [`IndexedTarBackend`] struct,
//! which cannot be used directly as a [`tectonic_io_base::IoProvider`] but is
//! the default backend for cached web-based bundle access through the
//! [`crate::cache::CachingBundle`] framework.
//! The main type offered by this module is the [`ItarBundle`] struct,
//! which can (but should not) be used directly as a [`tectonic_io_base::IoProvider`].
//!
//! Instead, wrap it with a [`crate:cache::BundleCache`] for filesystem-backed
//! caching.
//!
//! While the on-server file format backing the “indexed tar” backend is indeed
//! a standard `tar` file, as far as the client is concerned, this backend is
//! centered on HTTP byte-range requests. For each file contained in the backing
//! resource, the index file merely contains a byte offset and length that are
//! then used to construct an HTTP Range request to obtain the file as needed.

use flate2::read::GzDecoder;
use std::{convert::TryInto, io::Read, str::FromStr};
use crate::Bundle;
use flate2::bufread;
use std::io::BufRead;
use std::io::Cursor;
use std::io::Read;
use std::{collections::HashMap, io::BufReader};
use std::{thread, time};
use tectonic_errors::prelude::*;
use tectonic_geturl::{DefaultBackend, DefaultRangeReader, GetUrlBackend, RangeReader};
use tectonic_io_base::digest::{self, DigestData};
use tectonic_geturl::DefaultRangeReader;
use tectonic_geturl::{DefaultBackend, GetUrlBackend, RangeReader};
use tectonic_io_base::OpenResult;
use tectonic_io_base::{InputHandle, InputOrigin, IoProvider};
use tectonic_status_base::{tt_note, tt_warning, StatusBackend};

use crate::cache::{BackendPullData, CacheBackend};

const MAX_HTTP_ATTEMPTS: usize = 4;
const RETRY_SLEEP_MS: u64 = 250;

/// The internal file-information struct used by the [`IndexedTarBackend`].
/// The internal file-information struct used by the [`ItarBundle`].
#[derive(Clone, Copy, Debug)]
pub struct FileInfo {
offset: u64,
length: u64,
length: usize,
}

/// A simple web-based file backend based on HTTP Range requests.
///
/// This type implements the [`CacheBackend`] trait and so can be used for
/// web-based bundle access thorugh the [`crate::cache::CachingBundle`]
/// framework.
/// This bundle does not cache on its own, you probably want to wrap it
/// in a [`crate:cache::BundleCache`].
#[derive(Debug)]
pub struct IndexedTarBackend {
reader: DefaultRangeReader,
pub struct ItarBundle {
url: String,
/// Maps all available file names to [`FileInfo`]s.
/// This is empty after we create this bundle, so we don't need network
/// to make an object. It is automatically filled by get_index when we need it.
index: HashMap<String, FileInfo>,

/// RangeReader object, responsible for sending queries.
/// Will be None when the object is created, automatically
/// replaced with Some(...) once needed.
reader: Option<DefaultRangeReader>,
}

impl CacheBackend for IndexedTarBackend {
type FileInfo = FileInfo;
impl ItarBundle {
/// Make a new ItarBundle
pub fn new(url: String, _status: &mut dyn StatusBackend) -> Result<ItarBundle> {
Ok(ItarBundle {
index: HashMap::new(),
reader: None,
url,
})
}

fn open_with_pull(
start_url: &str,
status: &mut dyn StatusBackend,
) -> Result<(Self, BackendPullData)> {
// Step 1: resolve URL
/// Fill self.index
fn get_index(&mut self, status: &mut dyn StatusBackend) -> Result<()> {
let mut geturl_backend = DefaultBackend::default();
let resolved_url = geturl_backend.resolve_url(start_url, status)?;

// Step 2: fetch index
let index = {
let mut index = String::new();
let index_url = format!("{}.index.gz", &resolved_url);
tt_note!(status, "downloading index {}", index_url);
GzDecoder::new(geturl_backend.get_url(&index_url, status)?)
.read_to_string(&mut index)?;
index
};

// Step 3: get digest, setting up instance as we go

let mut cache_backend = IndexedTarBackend {
reader: geturl_backend.open_range_reader(&resolved_url),
};

let digest_info = {
let mut digest_info = None;
let resolved_url = geturl_backend.resolve_url(&self.url, status)?;

for line in index.lines() {
if let Ok((name, info)) = Self::parse_index_line(line) {
if name == digest::DIGEST_NAME {
digest_info = Some(info);
break;
}
}
}

atry!(
digest_info;
["backend does not provide needed {} file", digest::DIGEST_NAME]
)
};

let digest_text =
String::from_utf8(cache_backend.get_file(digest::DIGEST_NAME, &digest_info, status)?)
.map_err(|e| e.utf8_error())?;
let digest = DigestData::from_str(&digest_text)?;

// All done.
Ok((
cache_backend,
BackendPullData {
resolved_url,
digest,
index,
},
))
}

fn open_with_quick_check(
resolved_url: &str,
digest_file_info: &Self::FileInfo,
status: &mut dyn StatusBackend,
) -> Result<Option<(Self, DigestData)>> {
let mut cache_backend = IndexedTarBackend {
reader: DefaultBackend::default().open_range_reader(resolved_url),
};
let index_url = format!("{}.index.gz", &resolved_url);
tt_note!(status, "downloading index {}", index_url);
let reader =
bufread::GzDecoder::new(BufReader::new(geturl_backend.get_url(&index_url, status)?));

if let Ok(d) = cache_backend.get_file(digest::DIGEST_NAME, digest_file_info, status) {
if let Ok(d) = String::from_utf8(d) {
if let Ok(d) = DigestData::from_str(&d) {
return Ok(Some((cache_backend, d)));
}
self.index.clear();
for line in BufReader::new(reader).lines() {
if let Ok((name, info)) = Self::parse_index_line(&line?) {
self.index.insert(name, info);
}
}

Ok(None)
return Ok(());
}

fn parse_index_line(line: &str) -> Result<(String, Self::FileInfo)> {
/// Parse one line of index file
fn parse_index_line(line: &str) -> Result<(String, FileInfo)> {
let mut bits = line.split_whitespace();

if let (Some(name), Some(offset), Some(length)) = (bits.next(), bits.next(), bits.next()) {
Ok((
name.to_owned(),
FileInfo {
offset: offset.parse::<u64>()?,
length: length.parse::<u64>()?,
length: length.parse::<usize>()?,
},
))
} else {
// TODO: preserve the warning info or something!
bail!("malformed index line");
}
}
}

fn get_file(
impl IoProvider for ItarBundle {
fn input_open_name(
&mut self,
name: &str,
info: &Self::FileInfo,
status: &mut dyn StatusBackend,
) -> Result<Vec<u8>> {
tt_note!(status, "downloading {}", name);
) -> OpenResult<InputHandle> {
// Fetch index if it is empty
if self.index.len() == 0 {
match self.get_index(status) {
Err(e) => return OpenResult::Err(e),
_ => {}
};
}

// Historically, sometimes our web service would drop connections when
// fetching a bunch of resource files (i.e., on the first invocation).
// The error manifested itself in a way that has a not-so-nice user
// experience. Our solution: retry the request a few times in case it
// was a transient problem.
let info = match self.index.get(name) {
Some(a) => a,
None => return OpenResult::NotAvailable,
};

let n = info.length.try_into().unwrap();
let mut buf = Vec::with_capacity(n);
let mut overall_failed = true;
let mut any_failed = false;
let mut buf = Vec::with_capacity(info.length);

tt_note!(status, "downloading {}", name);

// Connect reader if it is not already connected
if self.reader.is_none() {
let mut geturl_backend = DefaultBackend::default();
let resolved_url = match geturl_backend.resolve_url(&self.url, status) {
Ok(a) => a,
Err(e) => return OpenResult::Err(e),
};
self.reader = Some(geturl_backend.open_range_reader(&resolved_url));
}

// Our HTTP implementation actually has problems with zero-sized ranged
// reads (Azure gives us a 200 response, which we don't properly
// handle), but when the file is 0-sized we're all set anyway!
if n > 0 {
for _ in 0..MAX_HTTP_ATTEMPTS {
let mut stream = match self.reader.read_range(info.offset, n) {
Ok(r) => r,
Err(e) => {
tt_warning!(status, "failure requesting \"{}\" from network", name; e);
any_failed = true;
continue;
}
};

if let Err(e) = stream.read_to_end(&mut buf) {
tt_warning!(status, "failure downloading \"{}\" from network", name; e.into());
any_failed = true;
if info.length == 0 {
return OpenResult::Ok(InputHandle::new_read_only(
name,
Cursor::new(buf),
InputOrigin::Other,
));
}

// Get file with retries
for n in 0..MAX_HTTP_ATTEMPTS {
let mut stream = match self
.reader
.as_mut()
.unwrap()
.read_range(info.offset, info.length)
{
Ok(r) => r,
Err(e) => {
tt_warning!(status, "failure requesting \"{}\" from network", name; e);
thread::sleep(time::Duration::from_millis(RETRY_SLEEP_MS));
continue;
}
};

overall_failed = false;
break;
if let Err(e) = stream.read_to_end(&mut buf) {
tt_warning!(status, "failure downloading \"{}\" from network", name; e.into());
thread::sleep(time::Duration::from_millis(RETRY_SLEEP_MS));
continue;
}

if overall_failed {
bail!(
"failed to retrieve \"{}\" from the network; \
this most probably is not Tectonic's fault \
-- please check your network connection.",
if n == MAX_HTTP_ATTEMPTS - 1 {
// All attempts failed
return OpenResult::Err(anyhow!(
"failed to retrieve \"{}\" from the network;
this most probably is not Tectonic's fault \
-- please check your network connection.",
name
);
} else if any_failed {
));
} else if n != 0 {
// At least one attempt failed
tt_note!(status, "download succeeded after retry");
}
break;
}

Ok(buf)
return OpenResult::Ok(InputHandle::new_read_only(
name,
Cursor::new(buf),
InputOrigin::Other,
));
}
}

impl Bundle for ItarBundle {
fn all_files(&mut self, status: &mut dyn StatusBackend) -> Result<Vec<String>> {
if self.index.len() == 0 {
// Try to fetch index if it is empty
let _ = self.get_index(status);
}

Ok(self.index.keys().cloned().collect())
}

fn get_location(&mut self) -> String {
return self.url.clone();
}
}