Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add an all-in-one latex_to_pdf() function and sweet testing infrastructure #252

Merged
merged 8 commits into from Nov 2, 2018

Conversation

@pkgw
Copy link
Collaborator

pkgw commented Oct 31, 2018

The front page of our API docs will now look way better! And the "executable" tests now have better behavior, not touching the local user's cache or the network.

pkgw added 5 commits Oct 30, 2018
This should be helpful for people who want to embed the engine as a library.
Tests and higher-level API exposure coming soon.
Along with the existing options, add a mode where output files simply aren't
written to disk. Once again, tests forthcoming.
It was linked to TermcolorStatusBackend because of the note_highlighted
method, but it is straightforward and reasonable to make that a trait function
with a sensible default impl.
…files

I *always* have trouble doing this, so here's a function that does something
along the lines of what I usually want.
This gives us a nice concise example for the very top of the Tectonic API
documentation. And hopefully it is actually useful for people!

The testing infrastructure got a big revamp to allow us to actually execute
the doctest on the new API docs. Doctest and integration-test executables are
linked against the standard build of the main crate, so the supporting test
infrastructure has to live in that crate and can't be gated behind a
`#[cfg(test)]` attribute. I figured out a way to make this happen that I'm
happy with — it makes these tests possible, but should only bloat the
crate by a small amount.

Importantly, this approach means that we can also make it so that the tests in
`tests/executable.rs`, which launch the build `tectonic` binary, leverage the
same test mode. This is a nice win — before, these tests would go to the
network and pull down assets into the local user's cache. Now they use the
same set of test assets as everybody else.
@pkgw

This comment has been minimized.

Copy link
Collaborator Author

pkgw commented Oct 31, 2018

CC @Mrmaxmeier this code might be relevant to #246, if I'm understanding it correctly. The new all-in-one function might be something C users would be interested in. Also I'd be interested in any comments you have in general.

@pkgw

This comment has been minimized.

Copy link
Collaborator Author

pkgw commented Oct 31, 2018

Hmmm, the Circle CI test failed on the new doctests:

   Doc-tests tectonic

running 2 tests
qemu: Unsupported syscall: 384
**
ERROR:/build/qemu-cYhkSq/qemu-2.11+dfsg/qom/object.c:1645:object_get_canonical_path_component: code should not be reached
qemu:handle_cpu_signal received signal outside vCPU context @ pc=0x601ae1f2
error: test failed, to rerun pass '--doc'
Exited with code 127

This looks pretty clearly to not be a problem with the actual implementation. I'll try rerunning; maybe it was transient ...

@pkgw

This comment has been minimized.

Copy link
Collaborator Author

pkgw commented Oct 31, 2018

Woohoo, it was transient!

@Mrmaxmeier @rekka Would you either of you happen to have a chance to look this over? I'm trying to be better about having my changes reviewed by someone else ...

@codecov

This comment has been minimized.

Copy link

codecov bot commented Oct 31, 2018

Codecov Report

Merging #252 into master will increase coverage by 0.19%.
The diff coverage is 55.21%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #252      +/-   ##
==========================================
+ Coverage   41.12%   41.32%   +0.19%     
==========================================
  Files         134      135       +1     
  Lines       60364    60952     +588     
==========================================
+ Hits        24827    25188     +361     
- Misses      35537    35764     +227
Impacted Files Coverage Δ
src/lib.rs 0% <ø> (ø) ⬆️
src/status/mod.rs 45.45% <0%> (-11.69%) ⬇️
src/io/stdstreams.rs 1.61% <0%> (-0.77%) ⬇️
src/bin/tectonic.rs 0% <0%> (ø) ⬆️
tests/executable.rs 22.37% <0%> (+4.3%) ⬆️
tests/driver.rs 100% <100%> (+5.88%) ⬆️
tests/trip.rs 98.88% <100%> (+0.55%) ⬆️
tests/formats.rs 87.32% <100%> (+2.87%) ⬆️
tests/util/mod.rs 52.17% <100%> (-3.62%) ⬇️
tests/tex-outputs.rs 97.7% <100%> (+1.17%) ⬆️
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 45da81c...650d86f. Read the comment docs.

/// This convenience function tries to help with the annoyances of getting
/// access to the in-memory file data after the engine has been run.
pub fn into_file_data(self) -> HashMap<OsString, Vec<u8>> {
// There must be a better way to do this ... Note that you *cannot*

This comment has been minimized.

Copy link
@Mrmaxmeier

Mrmaxmeier Oct 31, 2018

Contributor

This avoids the temporary. It can panic though:

Rc::try_unwrap(self.io.mem.files)
  .expect("multiple strong refs to MemoryIo files")
  .into_inner()

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

I think I like this more ... avoids rebuilding the hashmap. And I am pretty sure that it will never panic unless the caller has made their own copy of files, in which case they're getting what they deserve.

/// test mode is activated in this module, the `default_bundle()` and
/// `format_cache_path()` functions return results pointing to the test asset
/// tree, rather than whatever the user has actually configured.
static mut CONFIG_TEST_MODE_ACTIVATED: bool = false;

This comment has been minimized.

Copy link
@Mrmaxmeier

Mrmaxmeier Oct 31, 2018

Contributor

An AtomicBool here would avoid the few bits of unsafe code. Not sure if it's worth it though. AtomicBools are safe to mutate by reference but slightly awkward to use.

This comment has been minimized.

Copy link
@rekka

rekka Nov 1, 2018

Contributor

I'd also suggest using AtomicBool to avoid any unnecessary unsafe. Using it is not that bad.
Define using static CONFIG_TEST_MODE_ACTIVATED: AtomicBool = AtomicBool::new(false);.
Reading is as easy as CONFIG_TEST_MODE_ACTIVATED.load(Ordering::SeqCst) and storing CONFIG_TEST_MODE_ACTIVATED.store(forced, Ordering::SeqCst).

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

Yes, I agree that this is better. Thanks for the suggestion!

@@ -89,14 +89,29 @@ impl IoSetup {
}
}


/// Where does the "primary input" stream come from?
enum PrimaryInputMode {

This comment has been minimized.

Copy link
@rekka

rekka Nov 1, 2018

Contributor

This seems to duplicate code from driver.rs. Wouldn't PrimaryInputMode::Undefined be better expressed as None for Option<PrimaryInputMode>?

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

Yes, it duplicates a bit. But in both cases, the types are private to their respective modules — they are more or less implementation details of the "builder"-type objects. Because they barely come with any code, I felt it was better to duplicate a bit than to try share the type within the crate. Reasonable enough?

This comment has been minimized.

Copy link
@rekka

rekka Nov 1, 2018

Contributor

That makes sense. I did not see the big picture.

/// For more sophisticated uses, use the [`driver`] module, which provides a
/// high-level interface for driving the typesetting engines with much more
/// control over their behavior.
pub fn latex_to_pdf<T: AsRef<str>>(latex: T) -> Result<Vec<u8>> {

This comment has been minimized.

Copy link
@rekka

rekka Nov 1, 2018

Contributor

Is tectonic thread safe now? If I remember correctly, there used to be unsynchronized static globals in the C core, and the access to the bundle is not synchronized. It seems that this function lets the user run multiple instances of the engine in parallel.

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

Good point. We are still not thread safe, and probably won't be for a long time: the TeX engine just has a lot of shared global state in the C code. It will take a long time to put that all into some kind of non-global struct.

At the moment I can think of two solutions:

  1. Bite the bullet and add locking to the engines.
  2. Skirt the issue and just mention in this function's documentation that it is not thread-safe in the way you describe.

I lean towards option 2. Thoughts?

This comment has been minimized.

Copy link
@rekka

rekka Nov 1, 2018

Contributor

I would strongly prefer option 1. Rust selling point is "threads without data races", enforced by the type system. A quick and dirty solution is to have a Mutex that has to be acquired at the beginning of latex_to_pdf(similar to the test infrastructure currently), and add a note in the documentation that at most one instance of the engine can be running at a time and so the function might block until the other instances are finished.

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

I am definitely sympathetic to that argument. I've worried a bit about possibly causing mysterious deadlocks for callers, but I'm pretty sure that can't actually happen if we implement things in a reasonable way.

If we add locking inside the crate, though, I think it should be done at a lower level: around the pieces that directly invoke the C code. Someone could create two ProcessingSession objects in separate threads and break things right now, anyway.

Yeah, I think I'm talking myself into thinking that we should bite the bullet and add this sort of locking into the crate itself:

  1. It's really not that complicated to do so
  2. Things will definitely break if you multithread
  3. I believe that with a single in-and-out mutex, there's no way that the locking can cause problems for callers.

This comment has been minimized.

Copy link
@rekka

rekka Nov 1, 2018

Contributor

That's definitely a concern. One way to communicate this would be introducing a singleton struct, say Tectonic, that would have methods latex_to_pdf and processing_session_builder. The user would have to first create the struct using Tectonic::new(), which would return an error if Tectonic were already in use.

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

I think that approach is a little less desirable because one day, hopefully, we will be thread-safe, and then the singleton Tectonic will be unnecessary and a bit inconvenient.

I'll investigate adding locking within the crate and see how it works out.

This comment has been minimized.

Copy link
@pkgw

pkgw Nov 1, 2018

Author Collaborator

Yeah, that was pretty easy to implement. I tend to think this is the right way to go.

Because the C/C++ engines use a ton of shared global state, they are not
thread-safe. Up to this point, the `tectonic` crate didn't do anything about
this: callers needed to use their own mutex to prevent engines from stomping
all over each other's toes.

Upon further thought and discussion, I think this approach was a bit silly. In
particular, I was afraid of creating a situation where callers might find that
their processes mysteriously deadlocked, but I don't think that's actually
possible. We just have one mutex, and it is acquired and released in a totally
straightforward manner. I have trouble seeing how any trouble might arise.

... Famous last words?
@pkgw

This comment has been minimized.

Copy link
Collaborator Author

pkgw commented Nov 1, 2018

OK, this PR now adds an internal mutex in the Tectonic crate as well as the other changes building up to the all-in-one function.

The Circle CI build failed again due to the same error as before, which was transient the first time. I've relaunched it. If this failure happen frequently we'll need to figure out some way to mitigate it.

Any final comments?

@pkgw

This comment has been minimized.

Copy link
Collaborator Author

pkgw commented Nov 2, 2018

OK, will need to investigate the QEMU/big-endian doctests, but let's go ahead and merge this.

@pkgw pkgw merged commit 684e8fa into tectonic-typesetting:master Nov 2, 2018
2 of 3 checks passed
2 of 3 checks passed
ci/circleci Your tests failed on CircleCI
Details
continuous-integration/appveyor/pr AppVeyor build succeeded
Details
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@pkgw pkgw deleted the pkgw:doc-intro branch Nov 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants
You can’t perform that action at this time.