Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for data files installation #358

Closed
nthiery opened this issue Jun 24, 2020 · 13 comments · Fixed by #510
Closed

Support for data files installation #358

nthiery opened this issue Jun 24, 2020 · 13 comments · Fixed by #510

Comments

@nthiery
Copy link

nthiery commented Jun 24, 2020

It would be handy to have some analogue of setuptools's data_files entry to be
able to configure the installation of data files (or did I miss it?) .

Thanks in advance!

@bpabel
Copy link
Contributor

bpabel commented Jun 24, 2020

If you place your data files inside your package, flit will include them with your package.

@takluyver
Copy link
Member

Hi Nicolas!

You haven't missed it - there's currently no equivalent of data_files. Flit handles the equivalent of package_data automatically, but has no way to put data files outside the package.

Flit deliberately doesn't aim to cover every possibility in Python packaging: it's aimed at the 80/20 case (80% of the utility with 20% of the effort). And so far, external data files have been on the other side of that invisible boundary. In part that's because there isn't a simple, reliable way for code to know where external data files went (as far as I know), so you can't use it for data that your code will access.

I'm aware of some use cases for shipping things like man pages. That's kind of straying from library distribution to application distribution, though I accept that's a fuzzy distinction (e.g. we support scripts), and there's not necessarily a better option for distributing small applications written in Python. 🤷

The feature set isn't carved in stone. If there's a significant use case that requires external data files, I might accept an extra feature, either a general mechanism like setuptools or a specific one to handle e.g. man pages. But I don't want to replicate everything setuptools can do, so there's a line somewhere.

(This also potentially touches on the question of build steps, issue #119, because it might be desirable to generate things like man pages)

Thanks,
Thomas

@nthiery
Copy link
Author

nthiery commented Jun 24, 2020 via email

@saulshanabrook
Copy link

I wanted to note that to add an extension to Jupyter Server, you have to ship a data file: https://jupyter-server.readthedocs.io/en/latest/developers/extensions.html#distributing-a-server-extension

@bollwyvl
Copy link

bollwyvl commented Nov 5, 2020

isn't a simple, reliable way for code to know where external data files went

it seems like most of the use cases (man pages, jupyter config/share cruft) is about something else (maybe not python-adjacent) knowing about your files, without having to import/inspect them. but yeah, if they don't get cleaned up after the fact, that's a big problem.

a general mechanism

Maybe a Path.rglob-driven syntax, with negation, would save some of the weird pain of data_files requiring an explicit list:

[tools.flit.prefix_files."share/jupyter/labextensions/my-extension"]
"js" = ["yarn.lock"]
"js/dist" = ["*", "!*.map"]

This single feature has what's been keeping me from using flit, so 👍, and while I wouldn't know where to start, we'd probably be able to figure it out, if such a PR would be welcome!

@palao
Copy link

palao commented Dec 30, 2021

One of the things I like most from Flit is its "do one thing and do it well" approach. This is a very big plus for me given the great confusion I feel in (most of) the rest of the python packaging world. And I want to thank you very much for your excellent work in this regard.

However, for me it is an important handicap not to be able to distribute man pages (and possibly local documentation in other formats like html) together with the rest of my code.
As a user, I expect from a decent application to come along with a man page, at least. A very long help message is not a good solution always IMHO.

So @takluyver, I would be very much in favor of this feature. I offer my help too. I'd be delighted to contribute.

I understand that the task is more complex than it sounds. No doubt.

Would it be an idea to implement this as a kind of plugin, opt-in feature?

@takluyver
Copy link
Member

Would it work for people if Flit allowed data files, but all you could specify was a single data directory, with no filtering or rearrangement of the files to be installed? So your source tree might look something like this:

├── data
│   └── share
│       └── man
│           └── man1
│               └── sfollow.1.gz
├── LICENSE
├── pyproject.toml
├── README.rst
└── sfollow
    ├── __init__.py
    ├── ...

And then in pyproject.toml, you would specify something like data_directory = "data".

The rearrangement that setuptools allows doesn't really hide any complexity, because you still need to represent the full destination paths, and also understand the translation between source paths and destination paths.

I think this should be fine for anything like man pages or Jupyter nbextension files which are only going to be used once installed, because it doesn't really matter where they are in the source tree, so long as you know where to put them. If there's anything that needs to be at a particular location in the source tree, it's not so convenient, but I suspect that's less common, and you could symlink it (Windows users are keen to tell me that actually Windows is basically fine with symlinks these days :-).

Would it be an idea to implement this as a kind of plugin, opt-in feature?

It would certainly be opt-in, but I'd rather not go down the plugin route - my experience is that plugins are pretty much the antithesis of 'do one thing and do it well'.

@bollwyvl
Copy link

Yeah, a single data_directory (or even exactly-like-named data_files) of a single, as-already-exists-before-flit-build-is-run path to be copied into sys.prefix and removed on uninstall, would meet all non-abusive, portable use cases I've seen 😊.

Would you anticipate any way to filter what's included, or is that part of the exists-before-flit contract?

@takluyver
Copy link
Member

My thinking was to have no filtering at all, just install anything that's sitting in that directory. Please feel free to make a case for some sort of filtering if you think it would be useful - it's not decided yet. But if we can cover 80% of the use cases by doing the simple thing, that's always my preference.

@palao
Copy link

palao commented Dec 31, 2021

@takluyver, thank you for taking into consideration this request.

Your proposal seems to be reasonable.

About filtering: do you mean something like file collisions, for instance?

@takluyver
Copy link
Member

About filtering: do you mean something like file collisions, for instance?

Filtering would mean e.g. 'this folder is data files, but exclude this subfolder', or 'but exclude any files with a .html extension', or more complex rules like 'exclude this subfolder except for files with a .json extension'. I'm thinking that Flit won't support this, so all you can say is 'this folder is the data files'.

File collisions shouldn't be an issue 🤞 . If we don't do any rearrangement, there shouldn't be collisions going from the source files to the wheel. And collisions when trying to install the wheel are up to whatever does that (e.g. pip).

@nthiery
Copy link
Author

nthiery commented Jan 1, 2022

@takluyver: for whatever it's worth: from the use cases I have witnessnessed so far,
your proposal above
sounds to me like a good candidate for a 80/20 implementation.

Happy new year to flit and everyone involved :-)

@takluyver
Copy link
Member

I've had a go at this in #510.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants