Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First-party conda support #326

Closed
aaronreidsmith opened this issue Apr 24, 2020 · 15 comments
Closed

First-party conda support #326

aaronreidsmith opened this issue Apr 24, 2020 · 15 comments
Labels
feature request A tag for feature requests help wanted

Comments

@aaronreidsmith
Copy link
Member

We have tried to implement first-party conda support several times (see #174, #263 , #298, and #300, and finally #325) and have not been able to figure it out.

This issue is to track any discussion around conda support. We are very interested in outside help for anyone who may be a conda expert

@Arnoldosmium
Copy link

Hi, do you consider publish the package to conda-forge? The conda recipe will be tracked at a conda forge repo (and it should auto upgrade most of the time). I can help out.

@aaronreidsmith
Copy link
Member Author

aaronreidsmith commented May 12, 2020

@Arnoldosmium We'd be happy to publish to conda-forge, and would welcome help! Let us know what needs to be done on our end; as you can see from the slew of issues/PRs, we are conda novices.

@Arnoldosmium
Copy link

Arnoldosmium commented May 12, 2020

@aaronreidsmith those are pretty helpful! Some questions:

  • why did we use meta renderer in the first place? Was it because the dependencies may change frequently?
  • can i add you as the maintainer after i file a PR on conda-forge side? (you might need to confirm in that PR)

Things will definitely be resolved once on conda-forge:

  • the package will be built cross platforms
  • the package will be built against all supported python versions so at least both python 3.6 and 3.7 can use it.

@aaronreidsmith
Copy link
Member Author

why did we use meta renderer in the first place? Was it because the dependency may change frequently?

We prefer to manage all of our dependencies in one place (requirements.txt) because when dependencies do change, it's difficult to manage multiple sources of truth. So render_meta.py was used to copy those dependencies into meta.yaml at build time.

Our CI/CD process builds/tests for every push and PR. This allows us to ensure that we always "stay green" when we merge up, or cut a release. Having a recipe maintained outside of our CI/CD pipeline is an unknown for us. Is there a chance we could have something working in a wheel file, but it would break the conda build process?

can i add you as the maintainer after i file a PR on conda-forge side? (you might need to confirm in that PR)

Yes, I think it would be good to have an alkaline-ml org member be listed as a maintainer. Could you hold off on that PR until we've nailed down the best approach here?

the package will be built cross platforms

Yeah, we would like to support macOS, Windows, and Linux

the package will be built against all supported python versions so at least both python 3.6 and 3.7 can use it.

With wheels, we support 3.5+ (currently 3.5 - 3.8) across macOS, Windows, and Linux (using manylinux). This also includes both 32-bit and 64-bit for Windows. We tend to be slower in introducing support for newer versions of Python, since many of our dependencies (scipy, numpy, scikit-learn) have very complex build chains that generally takes them time to support a new language level.

When we initially started looking into conda, we found a lot of our dependencies do not build for 3.5 (Cython, for example), so we dropped 3.5 support. We'd like to support 3.5, but if we had to drop any, this would be the one, as its end-of-life is September 2020 anyway.

My understanding, is that to build a 32-bit conda distribution, you actually need to have 32-bit hardware (rather than a VM). Is that correct? That was our reasoning for dropping 32-bit support in conda. I don't know the breakdown of people using conda on a 32-bit machine, but I think we would support 32-bit if possible.

We also had a couple questions on our end:

  • You mention the conda recipe/package should auto upgrade -- how does that work? Our approach was obviously the render_meta.py file, so we didn't have to update the SHA256 in meta.yaml after a deploy to PyPI.
  • Similar to the above, we would like not to have to maintain a separate list of requirements in the meta.yaml file in conda-forge
  • I know conda is heavily tied to Numpy version. In requirements.txt, we specify a minimum version; will we have to specify an exact version in conda?

We've been wanting to reach out to the Conda team about whether there's a well-defined pattern for deploying to conda in a CI/CD pipeline. That will probably also inform how we adjust our process. Do you have any thoughts on that?

@Arnoldosmium
Copy link

the conda recipe/package should auto upgrade

Usually conda forge's auto upgrading bot will update (PR) the recipe, including version, hash checksum, and dependency (not shown in the PR)

Indeed there could be cases where the recipe failed to be in sync with the requirement.txt; yet, this mismatch may happen at max once per release, and build pipeline can potentially capture and notify us.

we would like not to have to maintain a separate list of requirements in the meta.yaml file in conda-forge

If this is still the preference, we can potentially work out some customized script to generate meta.yaml before the conda build procedure starts off. Conda forge people may also provide some thoughts once we file the PR.

we specify a minimum version of numpy

Yep numpy version, just like all other pkgs, can have a range in conda.

Currently I'm trying to generate the meta.yaml from the published pypi pkg via conda skeleton. Some unexpected error regarding the numpy package occurred. I'm trying to figure out where the issue is rooted.

@aaronreidsmith
Copy link
Member Author

Hey @Arnoldosmium, in attempting to get all of my ducks in a row while responding to you, I ended up getting conda skeleton working.

We require numpy to build wheels, and conda skeleton was attempting to build numpy (something that is apparently a huge no-no in conda). I was able to mitigate this using this command:

$ conda skeleton pypi pmdarima --setup-options="sdist"

This created the recipe as a source distribution and I was able to edit it from there.

With that being said, I went ahead and set up the alkaline-ml/pmdarima-conda repo. Our thought process was, we would like to keep all of the build pipelines in our hands as much as possible, but wanted to detach them from the release cycle we have in place for PyPI (not having conda support limited us from pushing features when they were all reliant on each other in the past).

I have the Linux and macOS builds working fine, but cannot get Windows to build. I am attempting to build locally using a Windows VM, but I would love another set of eyes on it.

Initially, I had this set up to build across the following matrix (excluding 32-bit on macOS and Linux):

matrix:
  os: [windows-latest, macos-latest, ubuntu-latest]
  python-version: ['3.5', '3.6', '3.7', '3.8']
  architecture: ['x86', 'x64']

However, I couldn't get 3.5 to successfully build on any platform, so I dropped that. We are still attempting to build on 32-bit, but since I can't get Windows to build at all, I have no idea if the failures are different between 32- and 64-bit. If you have any thoughts/insights on either of those, they would be much appreciated.

I am going to add you as a contributor on that repo, if that is alright with you. Currently, all of the substantive work is living in a branch called get-actions-working.

@Arnoldosmium
Copy link

I believe the --setup-options="sdist" is the trick to go after reading the setup.py.

Maintaining the conda recipe in your own org is also a great way to go. We may be able to do some automation (either via conda skeleton or via some extraction from requirement.txt) after the build pipeline stables.

Thanks for including me as a contributor, I actually have several ideas and would love to test out.

So far the matrix setup looks kosher to me. We might be able to unblock 3.5 if more conda repos are included by .condarc (will dig into error msg to figure out).

@aaronreidsmith
Copy link
Member Author

I'm actually not an admin on the org, so I have to wait on @tgsmith61591 😅 We'll get you added soon!

@jmillerbrooks
Copy link

jmillerbrooks commented Sep 7, 2020

Hi @aaronreidsmith , first of all thank you so much for all of your work on this fantastic package, I love it!

I am attempting to run pmdarima and statsmodels 0.12 in the same project/notebook, and the only way that I can see to do that is through conda. I have reviewed the issues you mentioned above, as well as the branch you set up specifically for this issue, so please forgive me if I missed this somewhere I'm happy to look wherever you point me without detailed reanswering if that's the case.

You mentioned on May 23rd that you were able to build the meta.yaml with conda skeleton by specifying sdist (this part I am also able to replicate, however trying to run conda-build without editing meta.yaml fails (happy to post output if this is helpful but I imagine you're well aware of this). You then stated that you were able "edit it from there"

My question is this: how does one edit the meta.yaml to get this to work? I've tried looking at the most recently edited version of the meta.yaml on the above mentioned branch (https://github.com/alkaline-ml/pmdarima-conda/blob/get-actions-working/recipe/meta.yaml), and as far as I can tell the only major differences between that and the one I'm able to generate with conda skeleton are the removal of sdist from the script line of build, and the addition of some tests. Does one need to do anything else to get this to install this way, and are the tests necessary?

Happy to take any level of feedback you're willing to give on this, realize you're probably very busy making this package work and that this issue will go away soon with the support of statsmodels 0.12 or the addition of conda support, but if you have a quick answer I'd really appreciate it. Thank you either way for your great work on this project!

Edit: Don't know how I missed this but also completely glossed over the following bit of that meta file you have as well, not sure what this is doing to be honest, also if it helps at all I am running Mac OS Catalina:

  build:
    - cython >=0.29,<0.29.18
    - numpy >=1.17.3
    - setuptools
    - git                  [not win]
    - patch                [not win]
    - m2-git               [win]
    - m2-patch             [win]
    - {{ compiler('c') }}

@aaronreidsmith
Copy link
Member Author

aaronreidsmith commented Sep 7, 2020

Hey @jmillerbrooks thanks for the well-thought-out questions! Let address them in chunks:

how does one edit the meta.yaml to get this to work?

You're right in that I essentially just did the conda skeleton command from above, and then removed the sdist in install command in the output meta.yaml file. The only other thing I really did is add a proper test section to make sure our test suite is run when it is built via conda build.

I've tried looking at the most recently edited version of the meta.yaml on the above mentioned branch

The last "working" build I have uses this meta.yaml, from May 22. It works for macOS and Linux, but not Windows, but it is using v1.6.1 of pmdarima, so it may need a little tweaking (should be just the hash and the version at the top).

You could also try this one, but that one was giving me some issues (at build time it would use the system Python instead of conda's, so I am not super confident in it).

are the tests necessary?

For your purposes, likely not. As I am sure you are aware, as of v1.7.1, we pinned statsmodels<0.12, as it was causing issues with our test suite. If you are aware of the risks, you can remove the test section; in fact, most packages just have an import <package> test to make sure it imports correctly (see here), but we are a little more thorough.

support of statsmodels 0.12

If you have access to pip, you are able to override our requirements.txt file like this:

$ pip install --no-deps pmdarima

This will install just the requested package and ignore its dependencies (you will have to install them manually).

As you can see, using this method I have both statsmodels 0.12 and pmdarima 1.7.1 installed (even though 1.7.1 technically doesn't support 0.12):

System:
    python: 3.7.3 (default, Mar  6 2020, 22:34:30)  [Clang 11.0.3 (clang-1103.0.32.29)]
executable: /Library/Developer/CommandLineTools/usr/bin/python3
   machine: Darwin-19.6.0-x86_64-i386-64bit

Python dependencies:
        pip: 20.2.2
 setuptools: 49.6.0
    sklearn: 0.23.2
statsmodels: 0.12.0
      numpy: 1.18.2
      scipy: 1.5.2
     Cython: 0.29.17
     pandas: 1.1.0
     joblib: 0.16.0
   pmdarima: 1.7.1

I think this will solve your immediate issue of being able to run pmdarima and statsmodels 0.12 side-by-side, but keep in mind, the behavior is not supported by us, as it is untested.

or the addition of conda support

I'm trying on this one! 😄 Life and my day job tend to get in the way sometimes, lol. As you can see from most of the builds in alkaline-ml/pmdarima-conda, Windows is the bane of my existence, while macOS and Linux tend to be a little easier. Additionally, I don't have a Windows machine at my disposal, so it makes testing much harder. We'll have a longer-term solution out ASAP

not sure what this is doing to be honest

This section is essentially saying what needs to be installed to build the package. Not all of our dependencies need to be used at build time, so it is not the entirety of our requirements.txt file.

This m2-* dependencies only need to be used on Windows, as they are Windows-specific builds of their Unix counterparts (m2 is short for msys2). They are filtered out via preprocessing selectors; I thought I read somewhere that the # was not necessary, but I am having trouble finding a source on that. Could just be a bug I need to fix! Finally, since C compilers differ from system-to-system, it just says it needs a C compiler, not a specific one.

Let me know if you think I missed anything or if you have follow up questions!

@jmillerbrooks
Copy link

Awesome, thank you so much for your detailed response! I'm admittedly out of my depth and also sure you are all over this as well, but in checking what was failing in statsmodels 0.12 vs. pmdarima before running this, I did come across this fix (statsmodels/statsmodels#7020) on their end closing what seems to be the same issue (statsmodels/statsmodels#7019) raised in #376, like I said sure you're already all over it, but on the off chance it helps at all, I'm really grateful for both your work on pmdarima and the attention in your response.

In addition, on the off chance this is at all helpful as a loose test case, I went ahead and ran with the second meta.yaml you referenced, and only updated the hash and version to 1.7.1

I'm not completely confident of my interpretation here, but it looks like it may have used the system python, and the default statsmodels:

System:
    python: 3.8.5 (default, Sep  4 2020, 02:22:02)  [Clang 10.0.0 ]
executable: $PREFIX/bin/python
   machine: macOS-10.15.6-x86_64-i386-64bit

Python dependencies:
        pip: 20.2.2
 setuptools: 49.6.0.post20200814
    sklearn: 0.23.2
statsmodels: 0.11.1
      numpy: 1.19.1
      scipy: 1.5.0
     Cython: 0.29.17
     pandas: 1.1.1
     joblib: 0.16.0
   pmdarima: 1.7.1
+ exit 0

However, the results of conda list show both statsmodels of 0.12.0 from normal conda-forge channel and pmdarima 1.7.1 built from new local channel created with conda-build, which was the desired result. Small amount of messing around with both packages in a jupyter notebook has not revealed any issues, everyone seems to play nice so to speak. This is probably just a fundamental misunderstanding of how conda works (still trying to get my head around it), but offering this if it's in any way helpful and would be happy to mess around with it further as best I can if there are other things you'd like to know/would be helpful.

Again really appreciate your prompt and considered response and also your work on this package, thank you!

@aaronreidsmith
Copy link
Member Author

Thanks for pointing out that statsmodels fix! Once they have it published in a release, we can fix our work around.

Glad it's working for you! That's interesting that conda list shows statsmodels 0.12 but it doesn't show up in pmdarima.show_versions(). Since all you did was change the version and hash, it implies that pmdarima did install statsmodels 0.11, but your environment.yaml probably overrode it with 0.12. Regardless, there is still work to be done on the conda front 🙂

If you'd like to mess around with things, you're more than welcome to fork alkaline-ml/pmdarima-conda and put a PR in! We are not the biggest conda experts, so it has been a slow process for us. Hopefully we'll figure it out soon!

@NowanIlfideme
Copy link

NowanIlfideme commented Dec 10, 2020

Is there a reason Cython is pinned to Cython<0.29.18,>=0.29 ? This is the only package causing a dependency contradiction in my current conda-based project, where it installs Cython 0.29.21 if not given pmdarima. Adding pmdarima==1.8.0 to the pip section just downgrades Cython to 0.29.18

Edit: It appears #335 was the reason - perhaps the bandaid isn't needed any more? 😃

@aaronreidsmith
Copy link
Member Author

Yeah, we have had a few pinned-and-forgotten-to-unpin dependencies; I can look into that one.

I'm sure you know this, but to be clear, this issue is more for actual Conda distributions (.tar.bz2 files) rather than the .whl files being downloaded when you use the pip section of Conda.

@aaronreidsmith
Copy link
Member Author

We recently found out that pmdarima is being build for Conda via the conda-forge/pmdarima-feedstock repo, so we will go ahead and close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request A tag for feature requests help wanted
Projects
None yet
Development

No branches or pull requests

4 participants