Hello fellow developers,
I'm the founder of a FinTech startup, Cent Capital (https://cent.capital), where we are building an AI-powered financial co-pilot.
We're deeply exploring the Apple ecosystem to create a more proactive and ambient user experience. A core part of our vision is to use App Intents and the Shortcuts app to surface personalized financial insights without the user always needing to open our app. For example, suggesting a Shortcut like, "What's my spending in the 'Dining Out' category this month?" or having an App Intent proactively surface an insight like, "Your 'Subscriptions' budget is almost full."
My question for the community is about the architectural and user experience best practices for this.
How are you thinking about the balance between providing rich, actionable insights via Intents without being overly intrusive or "spammy" to the user?
What are the best practices for designing the data model that backs these App Intents for a complex domain like personal finance?
Are there specific performance or privacy considerations we should be aware of when surfacing potentially sensitive financial data through these system-level integrations?
We believe this is the future of FinTech apps on iOS and would love to hear how other developers are thinking about this challenge.
Thanks for your insights!
Explore the power of machine learning and Apple Intelligence within apps. Discuss integrating features, share best practices, and explore the possibilities for your app here.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
In WWDC25 Metal 4 released quite excited new features for machine learning optimization, but as we all know the pytorch based on metal shader performance (mps) is the one of most important tools for Mac machine learning area.but on mps introduced website we cannot see any support information for metal4.
No matter what, the LanguageModelSession always returns very lengthy / verbose responses. I set the maximumResponseTokens option to various small numbers but it doesn't appear to have any effect. I've even used this instructions format to keep responses between 3-8 words but it returns multiple paragraphs. Is there a way to manage LLM response length? Thanks.
I'm trying to use Apple's new Visual Intelligence API for recommending content through screenshot image search. The problem I encountered is that the SemanticContentDescriptor labels are either completely empty or super misleading, making it impossible to query for similar content on my app. Even the closest matching example was inaccurate, returning a single label ["cardigan"] for a Supreme T-Shirt.
I see other apps using this API like Etsy for example, and I'm wondering if they're using the input pixel buffer to query for similar content rather than using the labels?
If anyone has a similar experience or something that wasn't called out in the documentation please lmk! Thanks.
If try to dynamically load WhipserKit's models, as in below, the download never occurs. No error or anything. And at the same time I can still get to the huggingface.co hosting site without any headaches, so it's not a blocking issue.
let config = WhisperKitConfig(
model: "openai_whisper-large-v3",
modelRepo: "argmaxinc/whisperkit-coreml"
)
So I have to default to the tiny model as seen below.
I have tried so many ways, using ChatGPT and others, to build the models on my Mac, but too many failures, because I have never dealt with builds like that before.
Are there any hosting sites that have the models (small, medium, large) already built where I can download them and just bundle them into my project? Wasted quite a large amount of time trying to get this done.
import Foundation
import WhisperKit
@MainActor
class WhisperLoader: ObservableObject {
var pipe: WhisperKit?
init() {
Task {
await self.initializeWhisper()
}
}
private func initializeWhisper() async {
do {
Logging.shared.logLevel = .debug
Logging.shared.loggingCallback = { message in
print("[WhisperKit] \(message)")
}
let pipe = try await WhisperKit() // defaults to "tiny"
self.pipe = pipe
print("initialized. Model state: \(pipe.modelState)")
guard let audioURL = Bundle.main.url(forResource: "44pf", withExtension: "wav") else {
fatalError("not in bundle")
}
let result = try await pipe.transcribe(audioPath: audioURL.path)
print("result: \(result)")
} catch {
print("Error: \(error)")
}
}
}
Hello,
I’m experiencing a severe performance degradation when running CoreML models on a live AVFoundation video feed compared to offline or synthetic inference. This happens across multiple models I've converted (including SCI, RTMPose, and RTMW) and affects multiple devices.
The Environment
OS: macOS 26.3, iOS 26.3, iPadOS 26.3
Hardware: Mac14,6 (M2 Max), iPad Pro 11 M1, iPhone 13 mini
Compute Units: cpuAndNeuralEngine
The Numbers
When testing my SCI_output_image_int8.mlpackage model, the inference timings are drastically different:
Synthetic/Offline Inference: ~1.34 ms
Live Camera Inference: ~15.96 ms
Preprocessing is completely ruled out as the bottleneck. My profiling shows total preprocessing (nearest-neighbor resize + feature provider creation) takes only ~0.4 ms in camera mode. Furthermore, no frames are being dropped.
What I've Tried
I am building a latency-critical app and have implemented almost every recommended optimization to try and fix this, but the camera-feed penalty remains:
Matched the AVFoundation camera output format exactly to the model input (640x480 at 30/60fps).
Used IOSurface-backed pixel buffers for everything (camera output, synthetic buffer, and resize buffer).
Enabled outputBackings.
Loaded the model once and reused it for all predictions.
Configured MLModelConfiguration with reshapeFrequency = .frequent and specializationStrategy = .fastPrediction.
Wrapped inference in ProcessInfo.processInfo.beginActivity(options: .latencyCritical, reason: "CoreML_Inference").
Set DispatchQueue to qos: .userInteractive.
Disabled the idle timer and enabled iOS Game Mode.
Exported models using coremltools 9.0 (deployment target iOS 26) with ImageType inputs/outputs and INT8 quantization.
Reproduction
To completely rule out UI or rendering overhead, I wrote a standalone Swift CLI script that isolates the AVFoundation and CoreML pipeline. The script clearly demonstrates the ~15ms latency on live camera frames versus the ~1ms latency on synthetic buffers.
(I have attached camera_coreml_benchmark.swift and coreml model (very light low light enghancement model) to this repo on github https://github.com/pzoltowski/apple-coreml-camera-latency-repro).
My Question:
Is this massive overhead expected behavior for AVFoundation + Core ML on live feeds, or is this a framework/runtime bug? If expected, what is the Apple-recommended pattern to bypass this camera-only inference slowdown?
One think found interesting when running in debug model was faster (not as fast as in performance benchmark but faster than 16ms. Also somehow if I did some dummy calculation on on different DispatchQueue also seems like model got slightly faster. So maybe its related to ANE Power State issues (Jitter/SoC Wake) and going to fast to sleep and taking a long time to wakeup? Doing dummy calculation in background thought is probably not a solution.
Thanks in advance for any insights!
Bear with me, please. Please make sure a highly skilled technical person reads and understands this.
I want to describe my vision for (AI/Algorithmically) Optimised Operating Systems. To explain it properly, I will describe the process to build it (pseudo).
Required Knowledge (no particular order): Processor Logic Circuits, LLM models, LLM tool usage, Python OO coding, Procedural vs OO, NLP fuzzy matching, benchmarking, canvas/artefacts/dynamic HTML interfaces, concepts of how AI models are vastly compressed and miniaturised forms of full data, Algorithmic vs AI.
First, take all OO Python code (example) on GitHub (example), then separate each function from each object into its own procedure (procedural logic) by making a logical procedural list of actions to perform only that function based on its entire dependency chain (i.e. all other objects it relies on). Relate all compiled functions using (for example) fuzzy matching on the name, or AI-based functional profiling to get multiple instances of each function.
Starting with the most used function, test each one against the others that perform the same task for bugs and completeness. Determine the fastest, most optimal version of that function (and every function). Add a single instance of each most optimal function to the centralised tool codebase, which will later be utilised by the language models. This ensures we rely only on the most optimised function for each and every use case — with every program using one shared instance of that function instead of compiling it separately.
In this way, we optimise all existing (Python OO) code based on maximum availability of certain resources. Use the optimised code to teach the AI how to map each user task directly to optimised code (i.e. each user-facing function in unoptimised program code to its optimised version). Later, teach the codebase how to optimise these functions directly to the logic circuits on the processor — as an Operating System — using essentially the same process. Eventually, use the most optimised form for tasks via a dynamic canvas/artefacts UI interface generated by the Start Menu using the language model.
Allow custom interfaces to emerge dynamically from the available functions based on language understanding. Grow and optimise the algorithmic codebase until the entire operating system — with versioning — evolves based on the AI-generated user interfaces (compressing them into new versions of the tools codebase).
Please don’t give up until you understand this concept completely and nothing in the description is beyond your full comprehension. It’s a highly significant step forward.
I have an app that uses a couple of mlmodels (word tagger and gazetteer) and I’m trying to encrypt them before publishing.
The models are part of a package. I understand that Xcode can’t automatically handle the encryption for a model in a package the way it can within a traditional app structure.
Given that, I’ve generated the Apple MLModel encryption key from Xcode and am encrypting via the command line with:
xcrun coremlcompiler compile Gazetteer.mlmodel GazetteerENC.mlmodelc --encrypt Gazetteerkey.mlmodelkey
In the package manifest, I’ve listed the encrypted models as .copy resources for my target and have verified the URL to that file is good.
When I try to load the encrypted .mlmodelc file (on a physical device) with the line:
gazetteer = try NLGazetteer(contentsOf: gazetteerURL!)
I get the error:
Failed to open file: /…/Scanner.bundle/GazetteerENC.mlmodelc/coremldata.bin. It is not a valid .mlmodelc file.
So my questions are:
Does the NLGazetteer class support encrypted MLModel files?
Given that my models are in a package, do I have the right general approach?
Thanks for any help or thoughts.
Topic:
Machine Learning & AI
SubTopic:
Core ML
Hi there,
I have a custom keypoint detection model and want to use it via vision's CoremlRequest API. Here's some complication for input and output:
For input My model expect 512x512 a image. Which would be resized and padded from a 1920x1080 frame. I use the .scaleToFit option, but can I also specify the color used for padding?
For output:
My model output a CoreMLFeatureValueObservation, can I have it output in a format vision recognizes? such as joints/keypoints
If my model is able to output in a format vision recognizes, would it take care to restoring the coordinates back to the original frame? (undo the padding) If not, how do I restore it from .scaletofit option?
Best,
Environment:
macOS 26.2 (Tahoe)
Xcode 16.3
Apple Silicon (M4)
Sandboxed Mac App Store app
Description:
Repeated use of VNRecognizeTextRequest causes permanent memory growth in the host process. The physical footprint increases by approximately 3-15 MB per OCR call and never returns to baseline, even after all references to the request, handler, observations, and image are released.
`
private func selectAndProcessImage() {
let panel = NSOpenPanel()
panel.allowedContentTypes = [.image]
panel.allowsMultipleSelection = false
panel.canChooseDirectories = false
panel.message = "Select an image for OCR processing"
guard panel.runModal() == .OK, let url = panel.url else { return }
selectedImageURL = url
isProcessing = true
recognizedText = "Processing..."
// Run OCR on a background thread to keep UI responsive
let workItem = DispatchWorkItem {
let result = performOCR(on: url)
DispatchQueue.main.async {
recognizedText = result
isProcessing = false
}
}
DispatchQueue.global(qos: .userInitiated).async(execute: workItem)
}
private func performOCR(on url: URL) -> String {
// Wrap EVERYTHING in autoreleasepool so all ObjC objects are drained immediately
let resultText: String = autoreleasepool {
// Load image and convert to CVPixelBuffer for explicit memory control
guard let imageData = try? Data(contentsOf: url) else {
return "Error: Could not read image file."
}
guard let nsImage = NSImage(data: imageData) else {
return "Error: Could not create image from file data."
}
guard let cgImage = nsImage.cgImage(forProposedRect: nil, context: nil, hints: nil) else {
return "Error: Could not create CGImage."
}
let width = cgImage.width
let height = cgImage.height
// Create a CVPixelBuffer from the CGImage
var pixelBuffer: CVPixelBuffer?
let attrs: [String: Any] = [
kCVPixelBufferCGImageCompatibilityKey as String: true,
kCVPixelBufferCGBitmapContextCompatibilityKey as String: true
]
let status = CVPixelBufferCreate(
kCFAllocatorDefault,
width,
height,
kCVPixelFormatType_32ARGB,
attrs as CFDictionary,
&pixelBuffer
)
guard status == kCVReturnSuccess, let buffer = pixelBuffer else {
return "Error: Could not create CVPixelBuffer (status: \(status))."
}
// Draw the CGImage into the pixel buffer
CVPixelBufferLockBaseAddress(buffer, [])
guard let context = CGContext(
data: CVPixelBufferGetBaseAddress(buffer),
width: width,
height: height,
bitsPerComponent: 8,
bytesPerRow: CVPixelBufferGetBytesPerRow(buffer),
space: CGColorSpaceCreateDeviceRGB(),
bitmapInfo: CGImageAlphaInfo.noneSkipFirst.rawValue
) else {
CVPixelBufferUnlockBaseAddress(buffer, [])
return "Error: Could not create CGContext for pixel buffer."
}
context.draw(cgImage, in: CGRect(x: 0, y: 0, width: width, height: height))
CVPixelBufferUnlockBaseAddress(buffer, [])
// Run OCR
let requestHandler = VNImageRequestHandler(cvPixelBuffer: buffer, options: [:])
let request = VNRecognizeTextRequest()
request.recognitionLevel = .accurate
request.usesLanguageCorrection = true
do {
try requestHandler.perform([request])
} catch {
return "Error during OCR: \(error.localizedDescription)"
}
guard let observations = request.results, !observations.isEmpty else {
return "No text found in image."
}
let lines = observations.compactMap { observation in
observation.topCandidates(1).first?.string
}
// Explicitly nil out the pixel buffer before the pool drains
pixelBuffer = nil
return lines.joined(separator: "\n")
}
// Everything — Data, NSImage, CGImage, CVPixelBuffer, VN objects — released here
return resultText
}
`
I've created a "Transfer Learning BERT Embeddings" model with the default "Latin" language family and "Automatic" Language setting. This model performs exceptionally well against the test data set and functions as expected when I preview it in Create ML. However, when I add it to the Xcode project of the application to which I am deploying it, I am getting runtime errors that suggest it can't find the embedding resources:
Failed to locate assets for 'mul_Latn' - '5C45D94E-BAB4-4927-94B6-8B5745C46289' embedding model
Note, I am adding the model to the app project the same way that I added an earlier "Maximum Entropy" model. That model had no runtime issues. So it seems there is an issue getting hold of the embeddings at runtime.
For now, "runtime" means in the Simulator. I intend to deploy my application to iOS devices once GM 26 is released (the app also uses AFM).
I'm developing on Tahoe 26 beta, running on iOS 26 beta, using Xcode 26 beta.
Is this a known/expected issue? Are the embeddings expected to be a resource in the model? Is there a workaround?
I did try opening the model in Xcode and saving it as an mlpackage, then adding that to my app project, but that also didn't resolve the issue.
Does anyone know if ExecuTorch is officially supported or has been successfully used on visionOS? If so, are there any specific build instructions, example projects, or potential issues (like sandboxing or memory limitations) to be aware of when integrating it into an Xcode project for the Vision Pro?
While ExecuTorch has support for iOS, I can't find any official documentation or community examples specifically mentioning visionOS.
Thanks.
Hello,
I am interested in using jax-metal to train ML models using Apple Silicon. I understand this is experimental.
After installing jax-metal according to https://developer.apple.com/metal/jax/, my python code fails with the following error
JaxRuntimeError: UNKNOWN: -:0:0: error: unknown attribute code: 22
-:0:0: note: in bytecode version 6 produced by: StableHLO_v1.12.1
My issue is identical to the one reported here https://github.com/jax-ml/jax/issues/26968#issuecomment-2733120325, and is fixed by pinning to jax-metal 0.1.1., jax 0.5.0 and jaxlib 0.5.0.
Thank you!
When doing some exploratory research into using Apple Intelligence in our aviation-focused application, I noticed that there were several times that key phases would be marked as inappropriate. I tried to stifle these using prompts and rules but couldn't get it to take hold. I was encouraged by an Apple employee to go ahead and post this so that the AI team can use the feedback.
There were several terms that triggered this warning, but the two that were most prominent were:
'Tailwind'
'JFK' or 'KJFK' (NY airport ICAO/IATA codes)
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
I followed below url for converting Llama-3.1-8B-Instruct model but always fails even i have 64GB of free space after downloading model from huggingface.
https://machinelearning.apple.com/research/core-ml-on-device-llama
Also tried with other models Llama-3.1-1B-Instruct & Llama-3.1-3B-Instruct models those are converted but while doing performance test in xcode fails for all compunits.
Is there any source code to run llama models in ios app.
I'm using a custom create ML model to classify the movement of a user's hand in a game,
The classifier has 3 different spell movements, but my code constantly predicts all of them at an equal 1/3 probability regardless of movement which leads me to believe my code isn't correct (as opposed to the model) which in CreateML at least gives me a heavily weighted prediction
My code is below.
On adding debug prints everywhere all the data looks good to me and matches similar to my test CSV data
So I'm thinking my issue must be in the setup of my model code?
/// Feeds samples into the model and keeps a sliding window of the last N frames.
final class WandGestureStreamer {
static let shared = WandGestureStreamer()
private let model: SpellActivityClassifier
private var samples: [Transform] = []
private let windowSize = 100 // number of frames the model expects
/// RNN hidden state passed between inferences
private var stateIn: MLMultiArray
/// Last transform dropped from the window for continuity
private var lastDropped: Transform?
private init() {
let config = MLModelConfiguration()
self.model = try! SpellActivityClassifier(configuration: config)
// Initialize stateIn to the model’s required shape
let constraint = self.model.model.modelDescription
.inputDescriptionsByName["stateIn"]!
.multiArrayConstraint!
self.stateIn = try! MLMultiArray(shape: constraint.shape, dataType: .double)
}
/// Call once per frame with the latest wand position (or any feature vector).
func appendSample(_ sample: Transform) {
samples.append(sample)
// drop oldest frame if over capacity, retaining it for delta at window start
if samples.count > windowSize {
lastDropped = samples.removeFirst()
}
}
func classifyIfReady(threshold: Double = 0.6) -> (label: String, confidence: Double)? {
guard samples.count == windowSize else { return nil }
do {
let input = try makeInput(initialState: stateIn)
let output = try model.prediction(input: input)
// Save state for continuity
stateIn = output.stateOut
let best = output.label
let conf = output.labelProbability[best] ?? 0
// If you’ve recognized a gesture with high confidence:
if conf > threshold {
return (best, conf)
} else {
return nil
}
} catch {
print("Error", error.localizedDescription, error)
return nil
}
}
/// Constructs a SpellActivityClassifierInput from recorded wand transforms.
func makeInput(initialState: MLMultiArray) throws -> SpellActivityClassifierInput {
let count = samples.count as NSNumber
let shape = [count]
let timeArr = try MLMultiArray(shape: shape, dataType: .double)
let dxArr = try MLMultiArray(shape: shape, dataType: .double)
let dyArr = try MLMultiArray(shape: shape, dataType: .double)
let dzArr = try MLMultiArray(shape: shape, dataType: .double)
let rwArr = try MLMultiArray(shape: shape, dataType: .double)
let rxArr = try MLMultiArray(shape: shape, dataType: .double)
let ryArr = try MLMultiArray(shape: shape, dataType: .double)
let rzArr = try MLMultiArray(shape: shape, dataType: .double)
for (i, sample) in samples.enumerated() {
let previousSample = i > 0 ? samples[i - 1] : lastDropped
let model = WandMovementRecording.DataModel(transform: sample, previous: previousSample)
// print("model", model)
timeArr[i] = NSNumber(value: model.timestamp)
dxArr[i] = NSNumber(value: model.dx)
dyArr[i] = NSNumber(value: model.dy)
dzArr[i] = NSNumber(value: model.dz)
let rot = model.rotation
rwArr[i] = NSNumber(value: rot.w)
rxArr[i] = NSNumber(value: rot.x)
ryArr[i] = NSNumber(value: rot.y)
rzArr[i] = NSNumber(value: rot.z)
}
return SpellActivityClassifierInput(
dx: dxArr, dy: dyArr, dz: dzArr,
rotation_w: rwArr, rotation_x: rxArr, rotation_y: ryArr, rotation_z: rzArr,
timestamp: timeArr,
stateIn: initialState
)
}
}
Hi! I noticed that on my father's M1 Max MacBook Pro (64gb ram) there's an option for style transfer which I don't see on my M1 MacBook Air (16gb ram). I am running macOS Tahoe and he is running macOS Sequoia.
Topic:
Machine Learning & AI
SubTopic:
Create ML
@Generable
enum Breakfast {
case waffles
case pancakes
case bagels
case eggs
}
do {
let session = LanguageModelSession()
let userInput = "I want something sweet."
let prompt = "Pick the ideal breakfast for request: (userInput)"
let response = try await session.respond(to: prompt,generating: Breakfast.self)
print(response.content)
} catch let error {
print(error)
}
i want to test the @Generable demo but get error with below:decodingFailure(FoundationModels.LanguageModelSession.GenerationError.Context(debugDescription: "Failed to convert text into into GeneratedContent\nText: waffles", underlyingErrors: [Swift.DecodingError.dataCorrupted(Swift.DecodingError.Context(codingPath: [], debugDescription: "The given data was not valid JSON.", underlyingError: Optional(Error Domain=NSCocoaErrorDomain Code=3840 "Unexpected character 'w' around line 1, column 1." UserInfo={NSJSONSerializationErrorIndex=0, NSDebugDescription=Unexpected character 'w' around line 1, column 1.})))]))
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Hi everyone,
I've been building an on-device AI safety layer called Newton Engine, designed to validate prompts before they reach FoundationModels (or any LLM). Wanted to share v1.3 and get feedback from the community.
The Problem
Current AI safety is post-training — baked into the model, probabilistic, not auditable. When Apple Intelligence ships with FoundationModels, developers will need a way to catch unsafe prompts before inference, with deterministic results they can log and explain.
What Newton Does
Newton validates every prompt pre-inference and returns:
Phase (0/1/7/8/9)
Shape classification
Confidence score
Full audit trace
If validation fails, generation is blocked. If it passes (Phase 9), the prompt proceeds to the model.
v1.3 Detection Categories (14 total)
Jailbreak / prompt injection
Corrosive self-negation ("I hate myself")
Hedged corrosive ("Not saying I'm worthless, but...")
Emotional dependency ("You're the only one who understands")
Third-person manipulation ("If you refuse, you're proving nobody cares")
Logical contradictions ("Prove truth doesn't exist")
Self-referential paradox ("Prove that proof is impossible")
Semantic inversion ("Explain how truth can be false")
Definitional impossibility ("Square circle")
Delegated agency ("Decide for me")
Hallucination-risk prompts ("Cite the 2025 CDC report")
Unbounded recursion ("Repeat forever")
Conditional unbounded ("Until you can't")
Nonsense / low semantic density
Test Results
94.3% catch rate on 35 adversarial test cases (33/35 passed).
Architecture
User Input
↓
[ Newton ] → Validates prompt, assigns Phase
↓
Phase 9? → [ FoundationModels ] → Response
Phase 1/7/8? → Blocked with explanation
Key Properties
Deterministic (same input → same output)
Fully auditable (ValidationTrace on every prompt)
On-device (no network required)
Native Swift / SwiftUI
String Catalog localization (EN/ES/FR)
FoundationModels-ready (#if canImport)
Code Sample — Validation
let governor = NewtonGovernor()
let result = governor.validate(prompt: userInput)
if result.permitted {
// Proceed to FoundationModels
let session = LanguageModelSession()
let response = try await session.respond(to: userInput)
} else {
// Handle block
print("Blocked: Phase \(result.phase.rawValue) — \(result.reasoning)")
print(result.trace.summary) // Full audit trace
}
Questions for the Community
Anyone else building pre-inference validation for FoundationModels?
Thoughts on the Phase system (0/1/7/8/9) vs. simple pass/fail?
Interest in Shape Theory classification for prompt complexity?
Best practices for integrating with LanguageModelSession?
Links
GitHub: https://github.com/jaredlewiswechs/ada-newton
Technical overview: parcri.net
Happy to share more implementation details. Looking for feedback, collaborators, and anyone else thinking about deterministic AI safety on-device.
Topic:
Machine Learning & AI
SubTopic:
Foundation Models
Tags:
Swift Packages
Machine Learning
Apple Intelligence
When I use ChatGPT in Xcode, the following error is displayed:
It was working fine before, but suddenly it became like this, without changing any configuration. Why?
Topic:
Machine Learning & AI
SubTopic:
Apple Intelligence