Parallel/Steam processing of Apple Intelligence

I have built a MAC-OS machine intelligence application that uses Apple Intelligence. A part of the application is to preprocess text. For longer text content I have implemented chunking to get around the token limit. However the application performance is now limited by the fact that Apple Intelligence is sequential in operation. This has a large impact on the application performance.

Is there any approach to operate Apple Intelligence in a parallel mode or even a streaming interface. As Apple Intelligence has Private Cloud Services I was hoping to be able to send multiple chunks in parallel as that would significantly improve performance.

Any suggestions would be welcome. This could also be considered a request for a future enhancement.

Would you mind to share what APIs are you using? If you are using the Foundation Model framework, please see the discussion here.

Best,
——
Ziqiao Chen
 Worldwide Developer Relations.

Ziqiao,

Thanks I am using Foundation Model Framework. I read the discussion you pointed to, thank you. I understand the limitation is the on device resources, but I was hoping that the PCS could be leveraged if resources were used up on the device. It looks like I cannot really use Apple Intelligence and I need to move to other LLMs.

Parallel/Steam processing of Apple Intelligence
 
 
Q