Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mac M1 Support #1101

Closed
camaya7 opened this issue Feb 19, 2021 · 36 comments
Closed

Mac M1 Support #1101

camaya7 opened this issue Feb 19, 2021 · 36 comments
Labels
dependencies Pull requests that update a dependency file waiting for answer Further information is requested

Comments

@camaya7
Copy link

camaya7 commented Feb 19, 2021

Hi, I'm wondering if Ludwig currently has support for the Mac M1? I have tried to install it several times through the steps on the website and no luck.

I've downloaded Tensorflow 2.4.0-rc0 (the only one available for M1) separately as it wasn't getting anywhere through the "pip install ludwig" command. I kept getting dependency conflict errors and I corrected them for the most part but there seems to be no workaround for TF- see conflicts below.

The conflict is caused by:
ludwig 0.3.3 depends on tensorflow>=2.3.1
ludwig 0.3.2 depends on tensorflow>=2.3.1
ludwig 0.3.1 depends on tensorflow>=2.2
ludwig 0.3 depends on tensorflow>=2.2
ludwig 0.2.2.8 depends on tensorflow==1.15.3
ludwig 0.2.2.7 depends on tensorflow==1.15.3
ludwig 0.2.2.6 depends on tensorflow==1.15.2
ludwig 0.2.2.5 depends on tensorflow==1.15.2
ludwig 0.2.2.4 depends on tensorflow==1.15.2
ludwig 0.2.2.3 depends on tensorflow==1.15.2
ludwig 0.2.2.2 depends on tensorflow-gpu==1.15.2
ludwig 0.2.2 depends on tensorflow-gpu==1.15.2
ludwig 0.2.1 depends on tensorflow==1.14.0
ludwig 0.2 depends on tensorflow==1.14.0
ludwig 0.1.2 depends on tensorflow==1.13.1
ludwig 0.1.1 depends on tensorflow==1.13.1
ludwig 0.1.0 depends on tensorflow>=1.12

To fix this you could try to:

  1. loosen the range of package versions you've specified
  2. remove package versions to allow pip attempt to solve the dependency conflict

It seems M1's limited TF availability is not letting the Ludwig install get through the TF dependencies.

Expected behavior
Successfully install Ludwig.

Environment (please complete the following information):

  • OS: Big Sur 11.2.1
  • Python 3.8
  • Ludwig 0.3.3

Thanks

@tgaddair
Copy link
Collaborator

Hey @camaya7, can you share the full log when you attempt to run pip install ludwig? Also, do you already have TensorFlow installed before you attempt to install Ludwig?

There shouldn't be any incompatibility with Ludwig and this version of TensorFlow. The issues are likely arising due to the way TensorFlow names their pip packages for specific hardware, which often conflict with the standard package names. It's usually something we can workaround by either letting Ludwig install first and installing TensorFlow afterwards, or (in the worse case) by installing Ludwig from source and removing the tensorflow dependency from the requirements.txt file.

@camaya7
Copy link
Author

camaya7 commented Feb 22, 2021

Hey @tgaddair, attached is the log, not sure if this is what you're looking for as it's a bit long.

screen.log

Tensorflow was installed as Tensorflow 2.4.0-rc0. For some time there I tried downloading Ludwig first and then TF but came around to the same problem with conflicting dependencies. Please let me know what you think here. In worst case I could try installing Ludwig from source without the tensorflow dependency per the requirements.txt file.

@w4nderlust
Copy link
Collaborator

Ludwig v0.3.3 supports tensorflow>=2.3.1 so in theory this should work, not really sure what is causing the issue.
One thing i would suggest to try is to clone ludwig locally, modify requirements.txt to contain tensorflow==2.4.0-rc0 and install py running: pip uninstall ludwig & pip install . from the ludwig directory.
Let us know if this helps as a workaround.

@w4nderlust w4nderlust added dependencies Pull requests that update a dependency file waiting for answer Further information is requested labels Feb 23, 2021
@camaya7
Copy link
Author

camaya7 commented Feb 23, 2021

Hi @w4nderlust, I've tried the steps you suggested and got the following error message:
ERROR: Could not find a version that satisfies the requirement tensorflow==2.4.0-rc0 (from ludwig)
ERROR: No matching distribution found for tensorflow==2.4.0-rc0

This keeping in mind that the same Tensorflow version is active in the same virtual environment as Ludwig's. I also ran the install without the Tensorflow dependency and got a similar error:
ERROR: Could not find a version that satisfies the requirement tfa-nightly==0.12.0.dev20201215223743 (from ludwig)
ERROR: No matching distribution found for tfa-nightly==0.12.0.dev20201215223743

@tgaddair
Copy link
Collaborator

Hey @camaya7, you may want to try unpinning tfa-nightly and installing the latest version. It appears that specific version is not available with M1.

@camaya7
Copy link
Author

camaya7 commented Mar 5, 2021

Hey @tgaddair, I've unpinned tfa-nightly and have tried installing the latest version separately yet I keep running into the same error:
ERROR: Could not find a version that satisfies the requirement tfa-nightly
ERROR: No matching distribution found for tfa-nightly
It may have to do with the TF connection to tfa-nightly but I haven't quite found the workaround.

@w4nderlust
Copy link
Collaborator

@camaya7 debugging this is a bit tricky because of the specific machine, sorry about that. I'm wondering: can you, in general, independently of Ludwig, install a tf + tfa combination that works on M1? In case you can find such a combination, then you should be able to use that in Ludwig.

Alternatively, you may remove the tfa dependency from Ludwig, as it is used specifically in a single module (sequence_decoders.py). This will make it so you won't be able to do sequence and text generation, but at least you may use all the other features of Ludwig.
Sorry for the inconvenience, i believe M1 support will improve in the future for TF and as a consequence for Ludwig.

@camaya7
Copy link
Author

camaya7 commented Mar 5, 2021

@w4nderlust it's okay, the tech needs to catch up. So far I've only found a TF build that works on M1 but not a tf + tfa combination. I'm going to continue searching for one so that I could use it with Ludwig--in that case would I just install Ludwig without the tf, tfa-nightly dependencies per the requirements.txt file?

I'm also going to try to remove the tfa dependency altogether as I wouldn't need sequence and text gen for now either way. Hopefully this gets me some progress with the install.

No worries, I look forward to seeing M1 improvements for TF and Ludwig.

@carlogrisetti
Copy link
Contributor

@camaya7 I have had the same pip behavior recently (cascading between different package versions and then ultimately failing).
Solution was to uninstall the package I was trying to install and reinstall it, and\or installing with pip install ludwig --force which tells pip to reinstall all packages involved.

In my case there was some cache\temp corruption, and this fixed it

@camaya7
Copy link
Author

camaya7 commented May 27, 2021

@carlogrisetti Hi Carlo, thanks so much for the tip, you're the man. It installed Ludwig correctly for the most part.

I'm dealing with with an "zsh: illegal hardware instruction" error whenever I try to run anything like ludwig train or ludwig experiment. Any idea what this could be? I've installed Tensorflow 2.4.0-rc0 separately in case that's what was missing but nothing yet.

@carlogrisetti
Copy link
Contributor

This may help you
apple/tensorflow_macos#143

@camaya7
Copy link
Author

camaya7 commented Jun 10, 2021

@carlogrisetti thanks for the tips. In an attempt to resolve the previous issue, I uninstalled ludwig and haven't been able to get it up and running again. It seems my cache is corrupt and there are conflicts with Tensorflow and other packages. Tensorflow-2.4.0-rc0 throws "Could not find a version that satisfies the requirement" errors so I've removed it from the requirements.txt file although I don't know how this will affect running ludwig later as it happened before as mentioned above.

Running pip install ludwig --force no longer works. Any advise here? It'd be highly appreciated.

@w4nderlust
Copy link
Collaborator

Debugging package dependencies issues is always tricky. Would creating a new virtualenv an option? that may be the most straightforward way.

@camaya7
Copy link
Author

camaya7 commented Jun 14, 2021

@w4nderlust Thanks, I tried out a new virtualenv and it helped in installing ludwig and its packages but I'm still getting an "illegal hardware instruction" error once I try to train a model, even with TF correctly installed. I'm getting the feeling I'm going to have to use another machine...

@amholler
Copy link
Collaborator

amholler commented Jul 2, 2021

FWIW, here's how I set up my mac m1 for ludwig.
https://docs.google.com/document/d/1Q1t9x4GN9rMnilMt2MohaiplmjUPOTcBkS2KpeT_p3s/edit?usp=sharing

@camaya7
Copy link
Author

camaya7 commented Jul 3, 2021

@amholler Thanks for this, I'm going to give it a shot and report with results.

@amholler
Copy link
Collaborator

amholler commented Jul 4, 2021

Sounds good, @camaya7 . BTW, I updated the google doc to indicate that I just successfully got through the first epoch of
my experiment of running "python train_higgs_small.py" with the update to reduce eval_batch_size. Woohoo.
The less good news is that my activity monitor memory status is yellow; AFAICT, the python train is swapping. Maybe
I should reduced batch size. I have 16G M1; how bout you?

@amholler
Copy link
Collaborator

amholler commented Jul 6, 2021

Hmm, @camaya7 not sure what the issues are. Just to verify, you did the steps in my google doc
under "Install Tensorflow" into a conda environment and you were on the step of trying to run
"pip3 install ." from a local copy of the ludwig source tree? What error did you see?

@camaya7
Copy link
Author

camaya7 commented Jul 6, 2021

@amholler, yeah, I followed the steps in setting up the conda environment and got to the "pip install ." step from the source tree. Then, that I ran into a bunch of errors from individual packages not installing correctly an so on. So, I installed all the ludwig reqs separately and ludwig installed all the way through. However, when I try to run a model, I'm now getting the error: "illegal hardware instruction ludwig train". I think it's a problem with TF. I'm checking but I can't find what I missed from your instructions.

@amholler
Copy link
Collaborator

amholler commented Jul 6, 2021

yeah, I believe that that means you are running the wrong version of TF.
Does your "conda list" output look like mine (in the last section of the doc)?
Did you install TF by running the following in your conda environment?
pip install --upgrade --force --no-dependencies https://github.com/apple/tensorflow_macos/releases/download/v0.1alpha3/tensorflow_macos-0.1a3-cp38-cp38-macosx_11_0_arm64.whl https://github.com/apple/tensorflow_macos/releases/download/v0.1alpha3/tensorflow_addons_macos-0.1a3-cp38-cp38-macosx_11_0_arm64.whl

@camaya7
Copy link
Author

camaya7 commented Jul 7, 2021

@amholler, I was pretty much running into the same errors as you so I followed through with your file, thanks so much for documenting it all. Ludwig is finally properly installed on my machine.

@jimthompson5802
Copy link
Collaborator

jimthompson5802 commented Jul 16, 2021

I just noticed this on Anconda's blog re: Apple's M1 chip support. It was published yesterday, 15Jul.
https://www.anaconda.com/blog/apple-silicon-transition

Conclusion of the posting
image

@w4nderlust
Copy link
Collaborator

To be honest, I don't agree with that assessment.
It is true that today (July 2021) doing data science on an M1 Mac is more difficult than doing it on an Intel Mac, but the reason is not that the machine are "not aimed at the data science and scientific user yet", the reason is that most of the data science stack in Python actually relies on libraries with heavy C/C++ bindings that need to be compile to ARM64, and the developers of these libraries have not yet released support for this architecture yet 9 months after the introduction of the M1-based machines.
It's a flaw in the Python ecosystem that is not ready, not in the machines that are not ready.
I believe the situation will improve quickly, in particular because ARM is making it into the data center and on other consumer products (Windows laptops with Qualcomm processors, Samsung getting into the ARM chip market, the Grace architecture from Nvidia and so on), so the maintainers of those libraries will likely need to adapt and release ARM compatible versions, and because of the many cores on these processors, I expect them to be actually much better than current Intel based ones for data science tasks (benchmarks of the custom M1 compiled TF already suggest that).

From the Ludwig point of view, we rely on tensorflow at the moment the tensor computation, so as soon as ARM support becomes a first class citizen in TF, the soon we'll be able to provide a seamless experience for those machines. Until then all we can do is sharing experience and workarounds like @amholler did, so that M1 (and ARM in general) users can use Ludwig.

@redwrasse
Copy link

Out of curiosity, has anyone figured out a better way of building ludwig on an m1 at this point? I'm dealing with this same class of problems on other projects.

@luisrh01
Copy link

Hi. Still having issues, I tried the process mentioned above in the DOCX file, but since Conda now uses TF2.6, there are conflicts with package versions for Ludwig… has anyone cracked the code on this?

@tgaddair
Copy link
Collaborator

Hey @luisrh01, you can try using TF 2.6 with Ludwig, most things should work, though we ran into a few issues related to distributed training (so if you're not doing distributed training, it should be generally supported). We'll also be completing a migration to PyTorch in the next few weeks, which could help simplify things here as well.

@nickovs
Copy link

nickovs commented Jan 11, 2022

I managed to get (the newer, PyTorch-based) Ludwig going on my M1 Pro without too much trouble. The following assumes that you have a fully installed Xcode with command line tools setup and that you have Homebrew installed.

First there are a couple of key libraries that you'll need to install: hdf5 and openblas.

brew install openblas hdf5

Also, if you want to use the image feature type (which many people will) then it turns out you also need a Rust compiler, since once of its dependencies is tokenizers, which is a wrapper on top of a library written in Rust. You can install this with Homebrew too:

brew install rust

You will also need to tell the Python packages where to find these libraries since they seem not to properly use pkgconfig:

export OPENBLAS=$(/opt/homebrew/bin/brew --prefix openblas)
export HDF5_DIR=$(/opt/homebrew/bin/brew --prefix hdf5)

You should then just be able to install Ludwig from source, ideally into a fresh venv since it requires older versions of some libraries such as scikit-learn:

virtualenv ludwig_env
. ludwig_env/bin/activate
git clone https://github.com/ludwig-ai/ludwig.git
cd Ludwig
pip install .

Note that most of the extra dependencies, including those for the different feature types, also require packages that are not yet available as binary wheels for the arm64 architecture and so will get built from source (which is fairly slow). Most of these extensions seem to work fine (for me at least), including audio, dask, hyperopt, ray, text andviz, as well as image if you have installed a Rust compiler. You can install the dependencies using the various requirements_*.txt files in the repo:

for x in audio dask hyperopt image ray text viz; do pip install -r requirements_${x}.txt; done

The dependencies for server require neuropod, which seems large and complex. It requires Basel to build it and I've not tried since I don't need to at this time.

I have also been unable to build horovod on my machine but have not investigated deeply since I don't need it.

Hopefully as time goes by more people will start building Python wheels on ARM as well as Intel Macs and some of these troubles will go away in the future. Until then, I hope this helps!

@w4nderlust
Copy link
Collaborator

@nickovs thank you very much for sharing your solution! It's very very appreciated!

@justinxzhao
Copy link
Collaborator

@nickovs Awesome write up, I was able to get your solution working on my computer for torch.

I filed a separate issue #1671 for a few remaining packages that still don't work, which also applies to a related conda-based torch installation.

@w4nderlust
Copy link
Collaborator

Adding to the thread: the specified method above by @nickovs fails for me if I use python 3.9 on a M1.
The reasons are: skimage and sklearn version and torchvision.
Unpinning sklearn version fixes it (fix already in master).
skimage is rascally only used for one function, ima_save so we'll likely replace it with the torch vision equivalent and get rid of that dependency.
torchvision when trying to read images in my setup returns a runtime error: Arguments: (RuntimeError('No such operator image::read_file'),)
I guess we still need to be a bit patient to get to easy arm64 support :)

@dantreiman
Copy link
Collaborator

Building horovod for M1 works if you disable eigen vectorization:

`CXXFLAGS="-DEIGEN_DONT_VECTORIZE" HOROVOD_WITH_PYTORCH=1 pip install "horovod[pytorch]"`

I ran into this again trying to install the latest master on my M1. Related: #2282

@dalianaliu dalianaliu added this to Needs triage in Issue Tracking via automation Jul 27, 2022
@dalianaliu dalianaliu moved this from Needs triage to In progress in Issue Tracking Jul 27, 2022
@connor-mccorm
Copy link
Collaborator

Closing due to issue being resolved. Will add a docs page on how to set up Ludwig on M1 and will add a comment here linking to the doc.

Issue Tracking automation moved this from In progress to Resolved Jul 28, 2022
@rudolfolah
Copy link
Contributor

Any update on this?

PyTorch seems to have better support for M1 now and it looks like it's activated when I checked:

import torch

print(torch.backends.mps.is_available())
print(torch.backends.mps.is_built())

Is there some code needed in the initialize_pytorch to get it to use MPS? https://github.com/ludwig-ai/ludwig/blob/master/ludwig/utils/torch_utils.py#L251

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    mps_device = torch.device("mps")

Source of this code: https://pytorch.org/docs/stable/notes/mps.html

rudolfolah added a commit to rudolfolah/ludwig that referenced this issue Dec 18, 2022
When MPS is available as a backend for PyTorch, returns "mps"

ludwig-ai#1101
@rudolfolah
Copy link
Contributor

I added the MPS device check to the code: master...rudolfolah:ludwig:patch-2

It ran in ~3.3 min using MPS, in contrast it was running in ~6.5 min when using CPU.

Unfortunately, the issue I ran into is that the model did not work correctly. It returned a completely incorrect result and included NaN as part of the output, there's a PyTorch issue opened here for the warning message: pytorch/pytorch#87221

/Users/rudolfo/Workspace/ludwig-code-gen/env/lib/python3.9/site-packages/torchmetrics/aggregation.py:83: UserWarning: Encounted `nan` values in tensor. Will be removed.
  warnings.warn("Encounted `nan` values in tensor. Will be removed.", UserWarning)
/Users/rudolfo/Workspace/ludwig/ludwig/utils/metric_utils.py:37: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  result += [data[(partitions == i).nonzero().squeeze(1)]]
# using MPS
source_code                             function world(a) { return a+1-1+1 }
test_code                                        expect(world(1)).toEqual(2)
test_code_predictions      [form, form, form, form, form, form, form, for...
test_code_probabilities    [nan, nan, nan, nan, nan, nan, nan, nan, nan, ...
test_code_probability                                                    NaN
Name: 0, dtype: object
# using CPU
source_code                             function world(a) { return a+1-1+1 }
test_code                                        expect(world(1)).toEqual(2)
test_code_predictions      [<SOS>, expect, (, addone, (, 1, ), ), ., toeq...
test_code_probabilities    [1.0, 1.0, 1.0, 0.77051467, 1.0, 1.0, 1.0, 1.0...
test_code_probability                                              -0.260697
Name: 0, dtype: object

@w4nderlust
Copy link
Collaborator

Thank for the update @rudolfolah . I guess we'll ned to just wait for the pytorch issue to be resolved unfortunately

@rudolfolah
Copy link
Contributor

This could be a possibility, Apple has provided tools to convert already trained PyTorch models to CoreML: https://github.com/apple/coremltools

This doesn't solve the issue when training a model but if testing out already trained models, it could be helpful as part of the pipeline for local development:

  1. download and load model
  2. convert with CoreML Tools to Core ML Model Format
  3. make predictions

I don't think it's something Ludwig needs to support directly, though it could be mentioned in the documentation for Mac install instructions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dependencies Pull requests that update a dependency file waiting for answer Further information is requested
Projects
Development

No branches or pull requests