### Search before asking - [x] I have searched the Multimodal Maestro [issues](https://github.com/roboflow/multimodal-maestro/issues) and found no similar feature requests. ### Question Add support for PHI-4 multimodal as I want to train a combined speech, image and text model. ### Additional _No response_