Mac M1 Support #1101

camaya7 · 2021-02-19T22:10:21Z

Hi, I'm wondering if Ludwig currently has support for the Mac M1? I have tried to install it several times through the steps on the website and no luck.

I've downloaded Tensorflow 2.4.0-rc0 (the only one available for M1) separately as it wasn't getting anywhere through the "pip install ludwig" command. I kept getting dependency conflict errors and I corrected them for the most part but there seems to be no workaround for TF- see conflicts below.

The conflict is caused by:
ludwig 0.3.3 depends on tensorflow>=2.3.1
ludwig 0.3.2 depends on tensorflow>=2.3.1
ludwig 0.3.1 depends on tensorflow>=2.2
ludwig 0.3 depends on tensorflow>=2.2
ludwig 0.2.2.8 depends on tensorflow==1.15.3
ludwig 0.2.2.7 depends on tensorflow==1.15.3
ludwig 0.2.2.6 depends on tensorflow==1.15.2
ludwig 0.2.2.5 depends on tensorflow==1.15.2
ludwig 0.2.2.4 depends on tensorflow==1.15.2
ludwig 0.2.2.3 depends on tensorflow==1.15.2
ludwig 0.2.2.2 depends on tensorflow-gpu==1.15.2
ludwig 0.2.2 depends on tensorflow-gpu==1.15.2
ludwig 0.2.1 depends on tensorflow==1.14.0
ludwig 0.2 depends on tensorflow==1.14.0
ludwig 0.1.2 depends on tensorflow==1.13.1
ludwig 0.1.1 depends on tensorflow==1.13.1
ludwig 0.1.0 depends on tensorflow>=1.12

To fix this you could try to:

loosen the range of package versions you've specified
remove package versions to allow pip attempt to solve the dependency conflict

It seems M1's limited TF availability is not letting the Ludwig install get through the TF dependencies.

Expected behavior
Successfully install Ludwig.

Environment (please complete the following information):

OS: Big Sur 11.2.1
Python 3.8
Ludwig 0.3.3

Thanks

tgaddair · 2021-02-21T17:17:54Z

Hey @camaya7, can you share the full log when you attempt to run pip install ludwig? Also, do you already have TensorFlow installed before you attempt to install Ludwig?

There shouldn't be any incompatibility with Ludwig and this version of TensorFlow. The issues are likely arising due to the way TensorFlow names their pip packages for specific hardware, which often conflict with the standard package names. It's usually something we can workaround by either letting Ludwig install first and installing TensorFlow afterwards, or (in the worse case) by installing Ludwig from source and removing the tensorflow dependency from the requirements.txt file.

camaya7 · 2021-02-22T22:31:05Z

Hey @tgaddair, attached is the log, not sure if this is what you're looking for as it's a bit long.

screen.log

Tensorflow was installed as Tensorflow 2.4.0-rc0. For some time there I tried downloading Ludwig first and then TF but came around to the same problem with conflicting dependencies. Please let me know what you think here. In worst case I could try installing Ludwig from source without the tensorflow dependency per the requirements.txt file.

w4nderlust · 2021-02-23T01:29:26Z

Ludwig v0.3.3 supports tensorflow>=2.3.1 so in theory this should work, not really sure what is causing the issue.
One thing i would suggest to try is to clone ludwig locally, modify requirements.txt to contain tensorflow==2.4.0-rc0 and install py running: pip uninstall ludwig & pip install . from the ludwig directory.
Let us know if this helps as a workaround.

camaya7 · 2021-02-23T21:27:57Z

Hi @w4nderlust, I've tried the steps you suggested and got the following error message:
ERROR: Could not find a version that satisfies the requirement tensorflow==2.4.0-rc0 (from ludwig)
ERROR: No matching distribution found for tensorflow==2.4.0-rc0

This keeping in mind that the same Tensorflow version is active in the same virtual environment as Ludwig's. I also ran the install without the Tensorflow dependency and got a similar error:
ERROR: Could not find a version that satisfies the requirement tfa-nightly==0.12.0.dev20201215223743 (from ludwig)
ERROR: No matching distribution found for tfa-nightly==0.12.0.dev20201215223743

tgaddair · 2021-02-28T02:37:59Z

Hey @camaya7, you may want to try unpinning tfa-nightly and installing the latest version. It appears that specific version is not available with M1.

camaya7 · 2021-03-05T20:49:29Z

Hey @tgaddair, I've unpinned tfa-nightly and have tried installing the latest version separately yet I keep running into the same error:
ERROR: Could not find a version that satisfies the requirement tfa-nightly
ERROR: No matching distribution found for tfa-nightly
It may have to do with the TF connection to tfa-nightly but I haven't quite found the workaround.

w4nderlust · 2021-03-05T20:57:10Z

@camaya7 debugging this is a bit tricky because of the specific machine, sorry about that. I'm wondering: can you, in general, independently of Ludwig, install a tf + tfa combination that works on M1? In case you can find such a combination, then you should be able to use that in Ludwig.

Alternatively, you may remove the tfa dependency from Ludwig, as it is used specifically in a single module (sequence_decoders.py). This will make it so you won't be able to do sequence and text generation, but at least you may use all the other features of Ludwig.
Sorry for the inconvenience, i believe M1 support will improve in the future for TF and as a consequence for Ludwig.

camaya7 · 2021-03-05T21:15:09Z

@w4nderlust it's okay, the tech needs to catch up. So far I've only found a TF build that works on M1 but not a tf + tfa combination. I'm going to continue searching for one so that I could use it with Ludwig--in that case would I just install Ludwig without the tf, tfa-nightly dependencies per the requirements.txt file?

I'm also going to try to remove the tfa dependency altogether as I wouldn't need sequence and text gen for now either way. Hopefully this gets me some progress with the install.

No worries, I look forward to seeing M1 improvements for TF and Ludwig.

carlogrisetti · 2021-05-23T08:04:19Z

@camaya7 I have had the same pip behavior recently (cascading between different package versions and then ultimately failing).
Solution was to uninstall the package I was trying to install and reinstall it, and\or installing with pip install ludwig --force which tells pip to reinstall all packages involved.

In my case there was some cache\temp corruption, and this fixed it

camaya7 · 2021-05-27T20:14:59Z

@carlogrisetti Hi Carlo, thanks so much for the tip, you're the man. It installed Ludwig correctly for the most part.

I'm dealing with with an "zsh: illegal hardware instruction" error whenever I try to run anything like ludwig train or ludwig experiment. Any idea what this could be? I've installed Tensorflow 2.4.0-rc0 separately in case that's what was missing but nothing yet.

carlogrisetti · 2021-05-28T05:38:25Z

This may help you
apple/tensorflow_macos#143

camaya7 · 2021-06-10T21:33:43Z

@carlogrisetti thanks for the tips. In an attempt to resolve the previous issue, I uninstalled ludwig and haven't been able to get it up and running again. It seems my cache is corrupt and there are conflicts with Tensorflow and other packages. Tensorflow-2.4.0-rc0 throws "Could not find a version that satisfies the requirement" errors so I've removed it from the requirements.txt file although I don't know how this will affect running ludwig later as it happened before as mentioned above.

Running pip install ludwig --force no longer works. Any advise here? It'd be highly appreciated.

w4nderlust · 2021-06-10T21:47:42Z

Debugging package dependencies issues is always tricky. Would creating a new virtualenv an option? that may be the most straightforward way.

camaya7 · 2021-06-14T21:58:01Z

@w4nderlust Thanks, I tried out a new virtualenv and it helped in installing ludwig and its packages but I'm still getting an "illegal hardware instruction" error once I try to train a model, even with TF correctly installed. I'm getting the feeling I'm going to have to use another machine...

amholler · 2021-07-02T17:25:28Z

FWIW, here's how I set up my mac m1 for ludwig.
https://docs.google.com/document/d/1Q1t9x4GN9rMnilMt2MohaiplmjUPOTcBkS2KpeT_p3s/edit?usp=sharing

camaya7 · 2021-07-03T19:29:50Z

@amholler Thanks for this, I'm going to give it a shot and report with results.

amholler · 2021-07-04T00:01:41Z

Sounds good, @camaya7 . BTW, I updated the google doc to indicate that I just successfully got through the first epoch of
my experiment of running "python train_higgs_small.py" with the update to reduce eval_batch_size. Woohoo.
The less good news is that my activity monitor memory status is yellow; AFAICT, the python train is swapping. Maybe
I should reduced batch size. I have 16G M1; how bout you?

amholler · 2021-07-06T19:55:29Z

Hmm, @camaya7 not sure what the issues are. Just to verify, you did the steps in my google doc
under "Install Tensorflow" into a conda environment and you were on the step of trying to run
"pip3 install ." from a local copy of the ludwig source tree? What error did you see?

camaya7 · 2021-07-06T20:03:49Z

@amholler, yeah, I followed the steps in setting up the conda environment and got to the "pip install ." step from the source tree. Then, that I ran into a bunch of errors from individual packages not installing correctly an so on. So, I installed all the ludwig reqs separately and ludwig installed all the way through. However, when I try to run a model, I'm now getting the error: "illegal hardware instruction ludwig train". I think it's a problem with TF. I'm checking but I can't find what I missed from your instructions.

amholler · 2021-07-06T20:10:00Z

yeah, I believe that that means you are running the wrong version of TF.
Does your "conda list" output look like mine (in the last section of the doc)?
Did you install TF by running the following in your conda environment?
pip install --upgrade --force --no-dependencies https://github.com/apple/tensorflow_macos/releases/download/v0.1alpha3/tensorflow_macos-0.1a3-cp38-cp38-macosx_11_0_arm64.whl https://github.com/apple/tensorflow_macos/releases/download/v0.1alpha3/tensorflow_addons_macos-0.1a3-cp38-cp38-macosx_11_0_arm64.whl

camaya7 · 2021-07-07T00:10:04Z

@amholler, I was pretty much running into the same errors as you so I followed through with your file, thanks so much for documenting it all. Ludwig is finally properly installed on my machine.

jimthompson5802 · 2021-07-16T22:09:46Z

I just noticed this on Anconda's blog re: Apple's M1 chip support. It was published yesterday, 15Jul.
https://www.anaconda.com/blog/apple-silicon-transition

Conclusion of the posting

w4nderlust · 2021-07-16T22:50:06Z

To be honest, I don't agree with that assessment.
It is true that today (July 2021) doing data science on an M1 Mac is more difficult than doing it on an Intel Mac, but the reason is not that the machine are "not aimed at the data science and scientific user yet", the reason is that most of the data science stack in Python actually relies on libraries with heavy C/C++ bindings that need to be compile to ARM64, and the developers of these libraries have not yet released support for this architecture yet 9 months after the introduction of the M1-based machines.
It's a flaw in the Python ecosystem that is not ready, not in the machines that are not ready.
I believe the situation will improve quickly, in particular because ARM is making it into the data center and on other consumer products (Windows laptops with Qualcomm processors, Samsung getting into the ARM chip market, the Grace architecture from Nvidia and so on), so the maintainers of those libraries will likely need to adapt and release ARM compatible versions, and because of the many cores on these processors, I expect them to be actually much better than current Intel based ones for data science tasks (benchmarks of the custom M1 compiled TF already suggest that).

From the Ludwig point of view, we rely on tensorflow at the moment the tensor computation, so as soon as ARM support becomes a first class citizen in TF, the soon we'll be able to provide a seamless experience for those machines. Until then all we can do is sharing experience and workarounds like @amholler did, so that M1 (and ARM in general) users can use Ludwig.

redwrasse · 2021-09-09T23:30:04Z

Out of curiosity, has anyone figured out a better way of building ludwig on an m1 at this point? I'm dealing with this same class of problems on other projects.

luisrh01 · 2021-10-14T13:35:39Z

Hi. Still having issues, I tried the process mentioned above in the DOCX file, but since Conda now uses TF2.6, there are conflicts with package versions for Ludwig… has anyone cracked the code on this?

tgaddair · 2021-10-15T04:58:53Z

Hey @luisrh01, you can try using TF 2.6 with Ludwig, most things should work, though we ran into a few issues related to distributed training (so if you're not doing distributed training, it should be generally supported). We'll also be completing a migration to PyTorch in the next few weeks, which could help simplify things here as well.

nickovs · 2022-01-11T00:30:19Z

I managed to get (the newer, PyTorch-based) Ludwig going on my M1 Pro without too much trouble. The following assumes that you have a fully installed Xcode with command line tools setup and that you have Homebrew installed.

First there are a couple of key libraries that you'll need to install: hdf5 and openblas.

brew install openblas hdf5

Also, if you want to use the image feature type (which many people will) then it turns out you also need a Rust compiler, since once of its dependencies is tokenizers, which is a wrapper on top of a library written in Rust. You can install this with Homebrew too:

brew install rust

You will also need to tell the Python packages where to find these libraries since they seem not to properly use pkgconfig:

export OPENBLAS=$(/opt/homebrew/bin/brew --prefix openblas)
export HDF5_DIR=$(/opt/homebrew/bin/brew --prefix hdf5)

You should then just be able to install Ludwig from source, ideally into a fresh venv since it requires older versions of some libraries such as scikit-learn:

virtualenv ludwig_env
. ludwig_env/bin/activate
git clone https://github.com/ludwig-ai/ludwig.git
cd Ludwig
pip install .

Note that most of the extra dependencies, including those for the different feature types, also require packages that are not yet available as binary wheels for the arm64 architecture and so will get built from source (which is fairly slow). Most of these extensions seem to work fine (for me at least), including audio, dask, hyperopt, ray, text andviz, as well as image if you have installed a Rust compiler. You can install the dependencies using the various requirements_*.txt files in the repo:

for x in audio dask hyperopt image ray text viz; do pip install -r requirements_${x}.txt; done

The dependencies for server require neuropod, which seems large and complex. It requires Basel to build it and I've not tried since I don't need to at this time.

I have also been unable to build horovod on my machine but have not investigated deeply since I don't need it.

Hopefully as time goes by more people will start building Python wheels on ARM as well as Intel Macs and some of these troubles will go away in the future. Until then, I hope this helps!

w4nderlust · 2022-01-11T01:24:12Z

@nickovs thank you very much for sharing your solution! It's very very appreciated!

justinxzhao · 2022-01-12T01:08:00Z

@nickovs Awesome write up, I was able to get your solution working on my computer for torch.

I filed a separate issue #1671 for a few remaining packages that still don't work, which also applies to a related conda-based torch installation.

w4nderlust · 2022-01-18T00:26:20Z

Adding to the thread: the specified method above by @nickovs fails for me if I use python 3.9 on a M1.
The reasons are: skimage and sklearn version and torchvision.
Unpinning sklearn version fixes it (fix already in master).
skimage is rascally only used for one function, ima_save so we'll likely replace it with the torch vision equivalent and get rid of that dependency.
torchvision when trying to read images in my setup returns a runtime error: Arguments: (RuntimeError('No such operator image::read_file'),)
I guess we still need to be a bit patient to get to easy arm64 support :)

dantreiman · 2022-07-14T21:40:07Z

Building horovod for M1 works if you disable eigen vectorization:

`CXXFLAGS="-DEIGEN_DONT_VECTORIZE" HOROVOD_WITH_PYTORCH=1 pip install "horovod[pytorch]"`

I ran into this again trying to install the latest master on my M1. Related: #2282

connor-mccorm · 2022-07-28T20:37:16Z

Closing due to issue being resolved. Will add a docs page on how to set up Ludwig on M1 and will add a comment here linking to the doc.

rudolfolah · 2022-12-17T19:28:11Z

Any update on this?

PyTorch seems to have better support for M1 now and it looks like it's activated when I checked:

import torch

print(torch.backends.mps.is_available())
print(torch.backends.mps.is_built())

Is there some code needed in the initialize_pytorch to get it to use MPS? https://github.com/ludwig-ai/ludwig/blob/master/ludwig/utils/torch_utils.py#L251

if not torch.backends.mps.is_available():
    if not torch.backends.mps.is_built():
        print("MPS not available because the current PyTorch install was not "
              "built with MPS enabled.")
    else:
        print("MPS not available because the current MacOS version is not 12.3+ "
              "and/or you do not have an MPS-enabled device on this machine.")

else:
    mps_device = torch.device("mps")

Source of this code: https://pytorch.org/docs/stable/notes/mps.html

When MPS is available as a backend for PyTorch, returns "mps" ludwig-ai#1101

rudolfolah · 2022-12-18T02:17:53Z

I added the MPS device check to the code: master...rudolfolah:ludwig:patch-2

It ran in ~3.3 min using MPS, in contrast it was running in ~6.5 min when using CPU.

Unfortunately, the issue I ran into is that the model did not work correctly. It returned a completely incorrect result and included NaN as part of the output, there's a PyTorch issue opened here for the warning message: pytorch/pytorch#87221

/Users/rudolfo/Workspace/ludwig-code-gen/env/lib/python3.9/site-packages/torchmetrics/aggregation.py:83: UserWarning: Encounted `nan` values in tensor. Will be removed.
  warnings.warn("Encounted `nan` values in tensor. Will be removed.", UserWarning)
/Users/rudolfo/Workspace/ludwig/ludwig/utils/metric_utils.py:37: UserWarning: The operator 'aten::nonzero' is not currently supported on the MPS backend and will fall back to run on the CPU. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/mps/MPSFallback.mm:11.)
  result += [data[(partitions == i).nonzero().squeeze(1)]]

# using MPS
source_code                             function world(a) { return a+1-1+1 }
test_code                                        expect(world(1)).toEqual(2)
test_code_predictions      [form, form, form, form, form, form, form, for...
test_code_probabilities    [nan, nan, nan, nan, nan, nan, nan, nan, nan, ...
test_code_probability                                                    NaN
Name: 0, dtype: object

# using CPU
source_code                             function world(a) { return a+1-1+1 }
test_code                                        expect(world(1)).toEqual(2)
test_code_predictions      [<SOS>, expect, (, addone, (, 1, ), ), ., toeq...
test_code_probabilities    [1.0, 1.0, 1.0, 0.77051467, 1.0, 1.0, 1.0, 1.0...
test_code_probability                                              -0.260697
Name: 0, dtype: object

w4nderlust · 2023-01-11T17:55:09Z

Thank for the update @rudolfolah . I guess we'll ned to just wait for the pytorch issue to be resolved unfortunately

rudolfolah · 2023-01-29T18:58:50Z

This could be a possibility, Apple has provided tools to convert already trained PyTorch models to CoreML: https://github.com/apple/coremltools

This doesn't solve the issue when training a model but if testing out already trained models, it could be helpful as part of the pipeline for local development:

download and load model
convert with CoreML Tools to Core ML Model Format
make predictions

I don't think it's something Ludwig needs to support directly, though it could be mentioned in the documentation for Mac install instructions.

w4nderlust added dependencies Pull requests that update a dependency file waiting for answer Further information is requested labels Feb 23, 2021

htahir1 mentioned this issue Jul 22, 2021

[BUG] Mac M1 environment unable to use zenml CLI zenml-io/zenml#93

Closed

justinxzhao mentioned this issue Jan 12, 2022

Torch tests that depend on image and audio packages don't work on Apple M1 ARM64 #1671

Open

dalianaliu added this to Needs triage in Issue Tracking via automation Jul 27, 2022

dalianaliu moved this from Needs triage to In progress in Issue Tracking Jul 27, 2022

connor-mccorm closed this as completed Jul 28, 2022

Issue Tracking automation moved this from In progress to Resolved Jul 28, 2022

rudolfolah added a commit to rudolfolah/ludwig that referenced this issue Dec 18, 2022

torch_utils: MPS device support

825850f

When MPS is available as a backend for PyTorch, returns "mps" ludwig-ai#1101

Mac M1 Support #1101

Mac M1 Support #1101

Comments

camaya7 commented Feb 19, 2021

tgaddair commented Feb 21, 2021

camaya7 commented Feb 22, 2021

w4nderlust commented Feb 23, 2021

camaya7 commented Feb 23, 2021

tgaddair commented Feb 28, 2021

camaya7 commented Mar 5, 2021

w4nderlust commented Mar 5, 2021

camaya7 commented Mar 5, 2021

carlogrisetti commented May 23, 2021

camaya7 commented May 27, 2021

carlogrisetti commented May 28, 2021

camaya7 commented Jun 10, 2021

w4nderlust commented Jun 10, 2021

camaya7 commented Jun 14, 2021 • edited

amholler commented Jul 2, 2021

camaya7 commented Jul 3, 2021

amholler commented Jul 4, 2021 • edited

amholler commented Jul 6, 2021

camaya7 commented Jul 6, 2021

amholler commented Jul 6, 2021

camaya7 commented Jul 7, 2021

jimthompson5802 commented Jul 16, 2021 • edited

w4nderlust commented Jul 16, 2021

redwrasse commented Sep 9, 2021

luisrh01 commented Oct 14, 2021

tgaddair commented Oct 15, 2021

nickovs commented Jan 11, 2022 • edited

w4nderlust commented Jan 11, 2022

justinxzhao commented Jan 12, 2022

w4nderlust commented Jan 18, 2022

dantreiman commented Jul 14, 2022

connor-mccorm commented Jul 28, 2022

rudolfolah commented Dec 17, 2022

rudolfolah commented Dec 18, 2022

w4nderlust commented Jan 11, 2023

rudolfolah commented Jan 29, 2023

camaya7 commented Jun 14, 2021 •

edited

amholler commented Jul 4, 2021 •

edited

jimthompson5802 commented Jul 16, 2021 •

edited

nickovs commented Jan 11, 2022 •

edited