Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Oxide] Automatic content detection #11173

Merged
merged 39 commits into from
May 12, 2023
Merged

[Oxide] Automatic content detection #11173

merged 39 commits into from
May 12, 2023

Conversation

adamwathan
Copy link
Member

This PR adds experimental support for what we're calling "automatic content detection" — a feature that lets Tailwind detect the paths it needs to scan to figure out which classes it needs to generate completely automatically, no content configuration required.

To use it, just omit the content option from your configuration file:

  // tailwind.config.js
  module.exports = {
    // Bye bye, you won't be missed
-   content: [
-     "./app/**/*.{js,ts,jsx,tsx,mdx}",
-     "./pages/**/*.{js,ts,jsx,tsx,mdx}",
-     "./components/**/*.{js,ts,jsx,tsx,mdx}",
-   ],
    theme: {
      extend: {},
    },
    plugins: [],
  }

You can also enable it explicitly by configuring content to 'auto':

// tailwind.config.js
module.exports = {
  content: 'auto',
  theme: {
    extend: {},
  },
  plugins: [],
}

Tailwind will automatically scan every file in your project (excluding gitignored files) that might contain classes and generate all the CSS you need with no configuration.

Since this feature is currently experimental, a warning is issued in the terminal any time it's enabled to make sure you know it's not stable.

This feature is only planned for the new Oxide engine, so it will only be available on the oxide-insiders tag and not the regular insiders tag. Will share a lot more about what Oxide is and when we really want people to start playing with it when it's a bit further along over the coming weeks.

How it works

Getting this to "just work" given all of the different places people use Tailwind is a serious challenge. This first stab at the problem uses a few heuristics and assumptions that are working very well for the types of projects we've tested it in:

  • Files are scanned using the current working directory — there's no magical way to know what the "root" of your project is, so Tailwind assumes that the current working directory is the root of your project. You should always run your build scripts from the root of your project to make sure the right files are scanned. This is only likely to be something you need to think about in complex monorepo-type setups, where you'll want to cd to the right folder in your scripts or use npm run with the --prefix option to explicitly set the CWD.
  • Any gitignored paths are skipped — any files or folders matched by your .gitignore file will not be scanned for classes. This prevents giant dependency folders like node_modules from being scanned, as well as any directories where you are storing generated files (like compiled CSS or JS), which avoids infinite rebuild loops.
  • All top-level folders are registered as content paths — we watch every top-level folder for changes as if you configured them yourself with globs like ./components/**/*.js and ./pages/**/*.js. This makes sure we notice any new files or folders you create and scan them without you needing to restart your build process.
  • ...except ./public — we explicitly don't watch ./public/**/*.{whatever} because it's common to store generated assets like compiled CSS and JS in the ./public folder which can cause infinite rebuild loops, particularly in webpack where we can't actually register globs to watch and can only register directories. Instead, we explicitly watch each individual file in ./public that could contain classes, like an index.html file for instance.
  • Top-level files are watched individually — we can't watch ./**/* because webpack doesn't support globs which means we'd end up watching ./node_modules, so instead we watch every top-level file that might contain classes individually. This means creating a new top-level file currently requires restarting your build process. In practice though it's extremely rare to create new top-level files that contain classes.
  • All binary file extensions are skipped — we don't scan files that obviously won't contain classes, like images, videos, zip files, etc.
  • Stylesheet files are skipped — we don't look for classes in css, scss, sass, less, or styl files.
  • Common generated files are skipped — we explicitly don't scan known generated files like package-lock.json.
  • Tailwind configuration files are skipped — your tailwind.config.js file is never going to be a source of classes to include in the final CSS so we don't scan it.
  • Classes are detected using a known list of file extensions — we automatically watch for a long list of common file types that could contain Tailwind classes, like .js, .html, .php, even .json. This way you don't need to restart your dev server the first time you create one of these files in an existing project.
  • Additional template extensions are detected based on your specific project — when we scan your project for classes, we keep track of every file extension we see and add them to our master list of extensions to watch for your project. So if you are using some obscure templating language I've never heard of that uses the *.potato extension, we'll watch *.potato files in all folders as long as we see at least one *.potato file when the build process starts.

In our testing these heuristics work great, and we've been able to remove the content configuration from every one of our own projects that we've tried it in.

If for whatever reason these heuristics don't work properly for your project, you can explicitly configure content just like you were doing before, and Tailwind will respect that configuration and not try to do any automatic content detection at all. This way you always have the option of full control over which files are scanned for classes.

Known limitations

How things work currently isn't perfect, and there are a few known limitations you might run into depending on how your project is structured.

  • Creating a new file with an unseen extension that isn't in our safelist requires restarting your build process — if your build process is already running and you try to create your very first *.piledriver file, Tailwind won't notice it and you'll need to restart your script.
  • Running your build command from a different directory will scan the wrong paths — because we treat the current working directory as your project root, you need to run your build script from the right place or Tailwind will scan the wrong files. In practice you probably will never notice this — you are already doing this if you use npm run {command} because that's how npm run already works.
  • You can't force specific gitignored files to be scanned while automatically detecting every other file — if you have some specific files you need to scan that live in node_modules but you want to ignore everything else in node_modules, you currently need to opt-out of automatic content detection and go back to explicitly configuring your content paths.
  • Creating a new top-level folder that includes files that need to be scanned requires restarting your build process — since we only watch top-level folders for new file events and not the entire project root, creating new top-level folders requires restarting your build process. A common example of where you might run into this is creating a ./components folder for the first time in a Next.js project while your dev server is already running.
  • The only way to explicitly prevent scanning a path is to gitignore it — you can't tell Tailwind not to scan a folder for classes without also gitignoring that folder. If you need more control, you need to opt-out of automatic content detection and configure content explicitly.

Despite these limitations, we're still finding automatic content detection to be miles ahead of explicit content configuration in terms of developer experience, and for projects that are structured in a conventional way you pretty much don't ever see or feel these limitations at all.

Planned improvements

While we're ready to start shipping support for this in our oxide-insiders builds as-is, we do have some improvements we plan to explore that will hopefully make the experience even better:

  • Skipping gitignored folders within top-level folders — because of limitations with webpack's dependency tracking APIs our current implementation doesn't skip gitignored folders unless they are top-level. So if you have something like ./src/node_modules, we still scan that folder. We should be able to solve this though, maybe even before we merge this PR.
  • Support for scanning specific paths in addition to automatically detected paths — using something like a new @source "./node_modules/my-library/dist/**/*.js" directive in your CSS, we hope to make it possible to scan paths that live within ignored directories without opting out of automatic content detection. This will also make it possible to scan for classes in parent/sibling directories, which some people might need in certain monorepo setups.
  • Configure content paths more intelligently based on the running build tool — not all build tools offer the same amount of control when it comes to registering paths we need to watch for changes, with webpack being the most limited. Currently we are solving for the lowest common denominator, but that's where limitations like "can't notice newly created top-level folders" come from. We can technically solve that in tools like Vite that offer more control, but to do that we need to detect the build tool you're using and intelligently register different dependency paths. We plan to explore this and see what we can come up with.

Really excited about this one, I think it's the biggest step-function improvement to the developer experience in Tailwind since the JIT engine. Looking forward to getting everyone playing with it so we can refine our heuristics and get things feeling as rock-solid as possible.

@ArnaudBarre
Copy link

ArnaudBarre commented May 7, 2023

Hi, and first of all thanks for all the great work, the thoughts puts in every APIs and Tailwind in general, it totally changed the way I author CSS in the last three years.

Would it be possible for Tailwind to offer a more bundler friendly API that let another tool select the appropriate files to be scanned (and also handle the change detection)? The last point you made still feels like you want Tailwind to detect the build tool, instead of providing an API for build tools to create plugins on top of it.

I know the current setup is nice for people from other languages to even be able to run Tailwind without node, but for ESM first tools like Vite, we have some hacks in the HMR handling specifically for Tailwind.

Letting the bundler dictate the content being scan means that you only scan the code that is in the final bundle, which leads to multiple benefits:

  • faster: you don't scan unrelated content
  • multiple scoped css bundle in the same directory is easy
  • importing components from shared folder on level above the project folder works

@jpsc
Copy link

jpsc commented May 7, 2023

This seems like a great DX improvement for common projects. And this is coming from someone who doesn't have a problem with needing to set content.

  • Support for scanning specific paths in addition to automatically detected paths — using something like a new @source "./node_modules/my-library/dist/**/*.js" directive in your CSS, we hope to make it possible to scan paths that live within ignored directories without opting out of automatic content detection.

Why would this be better than setting content?

I will definitely need to use this or the content because we use a component library via an npm package and node_modules are usually gitignored.

@adamwathan
Copy link
Member Author

@ArnaudBarre I think that all sounds great and would love to explore it more concretely — any interest in connecting about it sometime in the next week or two?

@adamwathan
Copy link
Member Author

Why would this be better than setting content?

@jpsc It's the same really — another thing we're trying to do for v4 is support more configuration from your CSS file instead of needing the JS config is all. You already need a CSS file no matter what, it would be nice if you could do everything in one place instead of needing two files.

Comment on lines +128 to +131
!(await fs
.stat(filePath)
.then(() => true)
.catch(() => false))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

existsSync is a better solution for this, and will avoid calling catch. Alternatively, statSync has { throwIfNoEntry: true } option to avoid the try/catch as well.

import fs from 'node:fs'

if (!fs.existsSync(filePath)

src/util/normalizeConfig.js Show resolved Hide resolved
@@ -1,14 +1,21 @@
import log from './log'

export function validateConfig(config) {
if (config.content.files.length === 0) {
if (config.content.files !== 'auto' && config.content.files.length === 0) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems this change is unnecessary, since files.length === 0 invalidates !== 'auto'

oxide/crates/core/src/lib.rs Outdated Show resolved Hide resolved
@ArnaudBarre
Copy link

@adamwathan I would be happy to discuss/explore this! I am quite available until May 17 (on UTC+2 TZ)

@RobinMalfait
Copy link
Contributor

Skipping gitignored folders within top-level folders — because of limitations with webpack's dependency tracking APIs our current implementation doesn't skip gitignored folders unless they are top-level. So if you have something like ./src/node_modules, we still scan that folder. We should be able to solve this though, maybe even before we merge this PR.

This part is solved now ✅

@jpsc
Copy link

jpsc commented May 8, 2023

Why would this be better than setting content?

@jpsc It's the same really — another thing we're trying to do for v4 is support more configuration from your CSS file instead of needing the JS config is all. You already need a CSS file no matter what, it would be nice if you could do everything in one place instead of needing two files.

Ok, that's understandable. Why would one file be less than two? So you could completely drop tailwind.config and just have the tailwind cli?

@ChristophP
Copy link

Hmm, as you mentioned this is not quite so easy to find heuristics that reliably find the files that are relevant. Good trick skipping binary files though.
but overall to me it seems like the cost of adding the content in the config is very low (one-time setup) compared to the potential pitfalls and additional unecessary filewatching (repeatedly) that I assume will likely happen with this approach.

@adamwathan
Copy link
Member Author

Going to merge this in so it's easier for people to play with and test — no guarantees we don't change directions here but hard to know how it plays out without just giving it a shot and trying to iterate on it 👍

1. Files in the root should be listed statically instead of using globs.
2. Files and folders in special known direct child folders should be
   listed statically instead of using globs (e.g.: `public`). This is
   because these special folders are often used to store generated AND
   source files at the same time. Using globs could trigger infinite
   loops because we are watching and acting upon dist files.
3. All file extensions found in the project, should be used in the globs
   in addition to a known set of extensions.
4. Direct folders seen from the root, can use the glob syntax
   `<root>/src/**/*.{...known-extensions}`
Not 100% convinced yet, but seems cleaner so far.
This will make it a bit easier to organize in the future.
The config file will automatically trigger a rebuild when this file is
changed. However, this should not be part of the template files because
that could cause additional css that's not being used.
- In the oxide engine, the default `content: []` will be dropped from
  the default configuration (config.simple.js, config.full.js).
- If you have `content: []` or `content: { files: [] }` then the auto
  content feature won't be active. However if those arrays are empty a
  warning will still be shown. Adding files/globs or dropping the
  `content` section completely will enable auto content.
This way we don't run into the issue where the `config.content.files` is
set and the `config.content.auto` is set to true.
Thanks, Clippy!
This will also make sure that if we have (deeply) nested ignored
folders, then we won't use deeply nested globs (**/*.{js,html}) for the
parent(s) of the nested ignored folders but instead use a shallow glob
for each directory (*/*.{js,html}).

Then each sibling directory of the parent can use deeply nested globs
again except for the direct parent.
On a big test project this goes from ~6s to ~200ms
We started with a ~6s duration
Then in the previous commit, we improved it by ~30x and it went down to
~200ms
Now with this change, it takes about ~40ms. That's another ~5x
improvement.

Or in total a ~150x improvement.
This is only called once so won't do anything to the main performance of
Tailwind CSS. But always nice to make small performance improvements!
@RobinMalfait RobinMalfait merged commit a7f7b76 into master May 12, 2023
21 checks passed
@RobinMalfait RobinMalfait deleted the feat/auto-content branch May 12, 2023 14:13
RobinMalfait added a commit that referenced this pull request May 12, 2023
* resolve all _existing_ content paths

* pin `@napi-rs/cli`

* WIP: Log all resolved content files/globs

* only filter out raw changed content in non-auto mode

* skip parseCandidateFiles cache in `auto` mode

* improve algorithm of detecting content paths

1. Files in the root should be listed statically instead of using globs.
2. Files and folders in special known direct child folders should be
   listed statically instead of using globs (e.g.: `public`). This is
   because these special folders are often used to store generated AND
   source files at the same time. Using globs could trigger infinite
   loops because we are watching and acting upon dist files.
3. All file extensions found in the project, should be used in the globs
   in addition to a known set of extensions.
4. Direct folders seen from the root, can use the glob syntax
   `<root>/src/**/*.{...known-extensions}`

* inline wanted-extensions

Not 100% convinced yet, but seems cleaner so far.

* ensure writing an file also makes the parent folder(s)

* add integration tests for the auto content feature

* add pnpm and bun lock files

* Revert "inline wanted-extensions"

This reverts commit 879c124.

* sort binary-extensions and add lockb

* sort + add `lock` to ignored extensions

* drop `yarn.lock`, because lock extensions are already covered

* group template extensions

This will make it a bit easier to organize in the future.

* drop empty lines and commented lines from template-extensions

* skip the config path when resolving template files

The config file will automatically trigger a rebuild when this file is
changed. However, this should not be part of the template files because
that could cause additional css that's not being used.

* make `auto content` the default in the oxide engine

- In the oxide engine, the default `content: []` will be dropped from
  the default configuration (config.simple.js, config.full.js).
- If you have `content: []` or `content: { files: [] }` then the auto
  content feature won't be active. However if those arrays are empty a
  warning will still be shown. Adding files/globs or dropping the
  `content` section completely will enable auto content.

* only test the auto content integration test in the oxide engine

* set `content.files` to `auto` instead of using `auto: boolean`

This way we don't run into the issue where the `config.content.files` is
set and the `config.content.auto` is set to true.

* drop log

* ensure we validate the config in the CLI

* show experimental warning for automatic content detection

* use cached version of the getCandidateFiles instead of bypassing it

* use `is_empty()` shorthand

Thanks, Clippy!

* add test to ensure nested ignored folders are not scanned

* add `tempfile` for tests

* add auto content tests in Rust

* refactor auto content detection

This will also make sure that if we have (deeply) nested ignored
folders, then we won't use deeply nested globs (**/*.{js,html}) for the
parent(s) of the nested ignored folders but instead use a shallow glob
for each directory (*/*.{js,html}).

Then each sibling directory of the parent can use deeply nested globs
again except for the direct parent.

* use consistent comments

* ensure ignored static listed files are not present

* improve performance by ~30x

On a big test project this goes from ~6s to ~200ms

* improve performance by ~5x

We started with a ~6s duration
Then in the previous commit, we improved it by ~30x and it went down to
~200ms
Now with this change, it takes about ~40ms. That's another ~5x
improvement.

Or in total a ~150x improvement.

* ensure nested folders in `public/` are also explicitly listed

* add shortcut for normalizing files

This is only called once so won't do anything to the main performance of
Tailwind CSS. But always nice to make small performance improvements!

* run Rust tests in CI

* fix lint warnings

* update changelog

* Update CHANGELOG.md

---------

Co-authored-by: Robin Malfait <malfait.robin@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants