Render advanced 3D graphics and perform data-parallel computations using graphics processors using Metal.

Metal Documentation

Posts under Metal subtopic

Post

Replies

Boosts

Views

Activity

iPhone limited to 60hz frame rate
Just wondering if anyone knows what it will take to hit greater than 60hz when targeting iPhone. If I set the preferredFramesPerSecond of an MTKView to 120, it works on the iPad, but on iPhone it never goes over 60hz, even with a simple hello triangle sample app... is this a limitation of targeting iPhone?
2
0
213
Sep ’25
Metal HUD Display Value Range
Can't seem to get the Metal HUD to display value range's (pre 26 Tahoe). The documented environment variable MTL_HUD_SHOW_VALUE_RANGE doesn't seem to work. https://developer.apple.com/documentation/xcode/monitoring-your-metal-apps-graphics-performance#Display-the-value-range-of-metrics Anyone having any luck?
2
0
343
Sep ’25
Pink screen on MTLCommandBuffer.presentDrawable.
I rewrote my graphics pipeline to use Load/Store better for clearing and don't care cases. All my tests pass, and in the Metal debugger, all the draw calls succeed. But when I present drawables (before [commandBuffer commit]) I only get a pink screen. I've tried everything I can think of: making sure the pixel formats are the same for the back buffer as my render targets, etc. But it's still pink. Could you point me in the right direction so I can fix this, or help describe why it's pink. That would be really helpful. Thank you, Brian Hapgood
2
0
410
Sep ’25
xCode26.x Metal4 classes do not compile
Hi, I am using xCode26.x. But my Metal4 classes are not compiling. I downloaded the sample code from Apple's website - https://developer.apple.com/documentation/Metal/processing-a-texture-in-a-compute-function. For example, I am getting errors like "Cannot find protocol declaration for 'MTL4CommandQueue'; I have hit a deadline. Any recommendations are very welcome. I have downloaded the Metal Tool chain. When I run the following commands on the terminal - xcodebuild -showComponent metalToolchain ; xcrun -f metal ; xcrun metal --version I get the following response - Asset Path: /System/Library/AssetsV2/com_apple_MobileAsset_MetalToolchain/86fbaf7b114a899754307896c0bfd52ffbf4fded.asset/AssetData Build Version: 17A321 Status: installed Toolchain Identifier: com.apple.dt.toolchain.Metal.32023 Toolchain Search Path: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/bin/metal Apple metal version 32023.830 (metalfe-32023.830.2) Target: air64-apple-darwin24.6.0 Thread model: posix InstalledDir: /Users/private/Library/Developer/DVTDownloads/MetalToolchain/mounts/86fbaf7b114a899754307896c0bfd52ffbf4fded/Metal.xctoolchain/usr/metal/current/bin
2
0
1.2k
Jan ’26
Metal 4: When is it ok to dealloc a MTLBuffer's memory
I have something like this drawing in an MTKView (see at bottom). I am finding it difficult to figure out when can the Swift-land resources used in making the MTLBuffer(s) be released? Below, for example, is it ok if args goes out of scope (or is otherwise deallocated) at point 1, 2, or 3? Or perhaps even earlier, as soon as argsBuffer has been created? I have been reading through various articles such as Setting resource storage modes Choosing a resource storage mode for Apple GPUs Copying data to a private resource but it's a lot to absorb and I haven't been really able to find an authoritative description of the required lifetime of the resources in CPU land. I should mention that this is Metal 4 code. In previous versions of Metal, the MTLCommandBuffer had the ability to add a completion handler to be called by the GPU after it has finished running the commands in the buffer but in Metal 4 there is no such thing (it it were even needed for the purpose I am interested in). Any advice and/or pointers to the definitive literature will be appreciated. guard let argsBuffer = device.makeBuffer(bytes: &args,... argumentTable.setAddress(argsBuffer.gpuAddress, ... encoder.setArgumentTable(argumentTable, stages: .vertex) // encode drawing renderEncoder.draw... ... encoder.endEncoding() // 1 commandBuffer.endCommandBuffer() // 2 commandQueue.waitForDrawable(drawable) commandQueue.commit([commandBuffer]) // 3 commandQueue.signalDrawable(drawable) drawable.present()
2
0
232
Jan ’26
Xcode Metal Trace
Code is download from apple official metal4 sample [https://developer.apple.com/documentation/metal/drawing-a-triangle-with-metal-4?language=objc] enable metal gpu trace in macOS schema and trace a frame in Xcode. Xcode may show segment fault on App from some 'GTTrace' function when click trace button. When replay a .gputrace file, Xcode may crash , throw an internal error or a XPC error. The example code using old metal-renderer can trace without any problem and everything works fine. Test Environment: Xcode Version 26.2 (17C52) macOS 26.2 (25C56) M1 Pro 16GB A2442
2
0
482
Jan ’26
BGContinuedProcessingTask GPU access — no iPhone support?
We are developing a video processing app that applies CIFilter chains to video frames. To not force the user to keep the app foregrounded, we were happy to see the introduction of BGContinuedProcessingTask to continue processing when backgrounded. With iOS 26, I was excited to see the com.apple.developer.background-tasks.continued-processing.gpu entitlement, which should allow GPU access in the background. Even the article in the documentation provides "exporting video in a film-editing app" or "applying visual filters (HDR, etc) or compressing images for social media posts" as use cases. However, when I check BGTaskScheduler.shared.supportedResources.contains(.gpu) at runtime, it returns false on every iPhone I've tested (including iPhone 15 Pro and iPhone 16 Pro). From forum responses I've seen, it sounds like background GPU access is currently limited to iPad only. If that's the case, I have a few questions: Is this an intentional, permanent limitation — or is iPhone support planned for a future iOS release? What is the recommended approach for GPU-dependent background work on iPhone? My custom CIKernels are written in Metal (as Apple recommends since CIKL is deprecated), but Metal CIKernels cannot fall back to CPU rendering. This creates a situation where Apple's own deprecation guidance (migrate to Metal) conflicts with background processing realities (no GPU on iPhone). Should developers maintain deprecated CIKL kernel versions alongside Metal kernels purely as a CPU fallback for background execution? That feels like it defeats the purpose of the migration. It seems like a gap in the platform: the API exists, the entitlement exists, but the hardware support isn't there for the most common device category. Any clarity on Apple's direction here would be very helpful.
2
0
129
1w
Why is depth/stencil buffer loaded/stored twice in xcode gpu capture?
I used xcode gpu capture to profile render pipeline's bandwidth of my game.Then i found depth buffer and stencil buffer use the same buffer whitch it's format is Depth32Float_Stencil8. But why in a single pass of pipeline, this buffer was loaded twice, and the Load Attachment Size of Encoder Statistics was double. Is there any bug with xcode gpu capture?Or the pass really loaded the buffer twice times?
1
0
375
Mar ’25
Implementing Scalable Order-Independent Transparency (OIT) in Metal
Hi, Apple’s documentation on Order-Independent Transparency (OIT) describes an approach using image blocks, where an array of size 4 is allocated per fragment to store depth and color in a tile shading compute pass. However, when increasing the scene’s depth complexity by adding more overlapping quads, the OIT implementation fails due to the fixed array size. Is there a way to dynamically allocate storage for fragments based on actual depth complexity encountered during rasterization, rather than using a fixed-size array? Specifically, can an adaptive array of fragments be maintained and sorted by depth, where the size grows as needed instead of being limited to 4 entries? Any insights or alternative approaches would be greatly appreciated. Thank you!
1
0
579
Mar ’25
why GLDContextRec::flushContextInternal() leads to abort
The flushContextInternal function in glr_sync.mm:262 called abort internally. What caused this? Was it due to high device temperature or some other reason? Date/Time: 2024-08-29 09:20:09.3102 +0800 Launch Time: 2024-08-29 08:53:11.3878 +0800 OS Version: iPhone OS 16.7.10 (20H350) Release Type: User Baseband Version: 8.50.04 Report Version: 104 Exception Type: EXC_CRASH (SIGABRT) Exception Codes: 0x0000000000000000, 0x0000000000000000 Triggered by Thread: 0 Thread 0 name: Thread 0 Crashed: 0 libsystem_kernel.dylib 0x00000001ed053198 __pthread_kill + 8 (:-1) 1 libsystem_pthread.dylib 0x00000001fc5e25f8 pthread_kill + 208 (pthread.c:1670) 2 libsystem_c.dylib 0x00000001b869c4b8 abort + 124 (abort.c:118) 3 AppleMetalGLRenderer 0x00000002349f574c GLDContextRec::flushContextInternal() + 700 (glr_sync.mm:262) 4 DiSpecialDriver 0x000000010824b07c Di::RHI::onRenderFrameEnd() + 184 (RHIDevice.cpp:118) 5 DiSpecialDriver 0x00000001081b85f8 Di::Client::drawFrame() + 120 (Client.cpp:155) 2024-08-27_14-44-10.8104_+0800-07d9de9207ce4c73289507e608e5de4320d02ccf.crash
1
0
141
Mar ’25
How can I get pixel coordinates in the fragment tile function?
In this video, tile fragment shading is recommended for image processing. In this example, the unpack function takes two arguments, one of which is RasterizerData. As I understand it, this is the data passed to us from the previous stage (Vertex) of the graphics pipeline. However, the properties of MTLTileRenderPipelineDescriptor do not include an option for specifying a Vertex function. Therefore, in this render pass, a mix of commands is used: first, a draw command is executed to obtain UV coordinates, and then threads are dispatched. My question is: without using a draw command, only dispatch, how can I get pixel coordinates in the fragment tile function? For the kernel tile function, everything is clear. typedef struct { float4 OPTexture [[ color(0) ]]; float4 IntermediateTex [[ color(1) ]]; } FragmentIO; fragment FragmentIO Unpack(RasterizerData in [[ stage_in ]], texture2d<float, access::sample> srcImageTexture [[texture(0)]]) { FragmentIO out; //... // Run necessary per-pixel operations out.OPTexture = // assign computed value; out.IntermediateTex = // assign computed value; return out; }
1
0
190
Mar ’25
Metal triangle strips uniform opacity.
I have this drawing app that I have been working on for the past few years when I have free time. I recently rebuilt the app in Metal to build out other brushes and improve performance, need to render 10000s of lines in realtime. I’m running into this issue trying to create a uniform opacity per path. I have a solution but do not love it - as this is a realtime app and the solution could have some bottlenecks. If I just generate a triangle strip from touch points and do my best to smooth, resample, and handle miters I will always get some overlaps. See: To create a uniform opacity I render to an offscreen texture with blending disabled. I then pre-multiply the color and draw that texture to a composite texture with blending on (I do this per path). This works but gets tricky when you introduce a textured brush, the edges of the texture in the frag shader cut out the line. Pasted Graphic 1.png Solution: I discard below a threshold fragment float4 fragment_line(VertexOut in [[stage_in]], texture2d<float> texture [[ texture(0) ]]) { constexpr sampler s(coord::normalized, address::mirrored_repeat, filter::linear); float2 texCoord = in.texCoord; float4 texColor = texture.sample(s, texCoord); if (texColor.a < 0.01) discard_fragment(); // may be slow (from what I read) return in.color * texColor; } Better but still not perfect. Question: I'm looking for better ways to create a uniform opacity per path. I tried .max blending but that will cause no blending of other paths. Any tips, ideas, much appreciated. If this is too detailed of a question just achieve.
1
0
114
Mar ’25
Threadgroup memory for fragment shader
Hello I am trying to get thread group memory access in fragment shader. In essence, I would like to have all the fragments in a tile to bitwiseOR some value. My idea was to use simd_or across the SIMD group, then make each SIMD group thread 0 to atomic or the value into thread group memory. Finally very first thread of the tile would be tasked with writing the value down to texture with write access. Now, I can allocate the thread group memory argument to the fragment function all right. MTLRenderEncoder has setThreadgroupMemoryLength call, which I am using the following way [renderEncoder setThreagroupMemoryLength: 16 offset: 0 atIndex:0] Unfortunately, all I am getting is the following error (runtime assertion) -[MTLDebugRenderCommandEncoder setThreadgroupMemoryLength:offset:atIndex:]:3487: failed assertion Set Threadgroup Memory Length Validation offset + length(16) must be <= threadgroupMemoryLength(0).` What I am doing wrong? How I can get thread group memory in the fragment shader? I know I could use tile shading and compute function but the problem is that here I really like to use fragment stuff. Will be grateful for help.
1
0
116
Apr ’25
Support for clock() shader instruction in MSL similar to VK_KHR_shader_clock instructions
Hi, seems MSL is missing support for a clock() shader instruction available in other graphics APIs like Vulkan or OpenGL for example.. useful for counting cost in number of clock cycles of some code insider shader with much finer granularity than launching a micro kernel with same instructions and measuring cycles cost from CPU.. also useful for MoltenVK to support that extensions.. thanks..
1
0
168
Apr ’25
CMake unable to generate the Xcode file described in this tutorial
In the Creating A 3D Application With Hydra Rendering tutorial on the Apple Developer website, on the last step where I execute this command: cmake -S ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ -B ~/Users/macuser/CreatingA3DApplicationWithHydraRendering/ I keep getting an error: CMake Error at CMakeLists.txt:5 (include): include could not find requested file: /Users/macuser/USDInstall/bin/pxrConfig.cmake I've tried to follow the instructions as mentioned in the README.md file included in the project files at least 5 times as well as moving the pxrConfig.cmake file around and copying it in different folders, then executed the command and was still unsuccessful into generating the proper file expected to compile and render the HydraPlayer renderer. How do I get cmake to generate the Xcode file to create the HydraPlayer renderer?
1
0
176
May ’25
Sparse Texture Writes
Hey, I've been struggling with this for some days now. I am trying to write to a sparse texture in a compute shader. I'm performing the following steps: Set up a sparse heap and create a texture from it Map the whole area of the sparse texture using updateTextureMapping(..) Overwrite every value with the value "4" in a compute shader Blit the texture to a shared buffer Assert that the values in the buffer are "4". I have a minimal example (which is still pretty long unfortunately). It works perfectly when removing the line heapDesc.type = .sparse. What am I missing? I could not find any information that writes to sparse textures are unsupported. Any help would be greatly appreciated. import Metal func sparseTexture64x64Demo() throws { // ── Metal objects guard let device = MTLCreateSystemDefaultDevice() else { throw NSError(domain: "SparseNotSupported", code: -1) } let queue = device.makeCommandQueue()! let lib = device.makeDefaultLibrary()! let pipeline = try device.makeComputePipelineState(function: lib.makeFunction(name: "addOne")!) // ── Texture descriptor let width = 64, height = 64 let format: MTLPixelFormat = .r32Uint // 4 B per texel let desc = MTLTextureDescriptor() desc.textureType = .type2D desc.pixelFormat = format desc.width = width desc.height = height desc.storageMode = .private desc.usage = [.shaderWrite, .shaderRead] // ── Sparse heap let bytesPerTile = device.sparseTileSizeInBytes let meta = device.heapTextureSizeAndAlign(descriptor: desc) let heapBytes = ((bytesPerTile + meta.size + bytesPerTile - 1) / bytesPerTile) * bytesPerTile let heapDesc = MTLHeapDescriptor() heapDesc.type = .sparse heapDesc.storageMode = .private heapDesc.size = heapBytes let heap = device.makeHeap(descriptor: heapDesc)! let tex = heap.makeTexture(descriptor: desc)! // ── CPU buffers let bytesPerPixel = MemoryLayout<UInt32>.stride let rowStride = width * bytesPerPixel let totalBytes = rowStride * height let dstBuf = device.makeBuffer(length: totalBytes, options: .storageModeShared)! let cb = queue.makeCommandBuffer()! let fence = device.makeFence()! // 2. Map the sparse tile, then signal the fence let rse = cb.makeResourceStateCommandEncoder()! rse.updateTextureMapping( tex, mode: .map, region: MTLRegionMake2D(0, 0, width, height), mipLevel: 0, slice: 0) rse.update(fence) // ← capture all work so far rse.endEncoding() let ce = cb.makeComputeCommandEncoder()! ce.waitForFence(fence) ce.setComputePipelineState(pipeline) ce.setTexture(tex, index: 0) let threadsPerTG = MTLSize(width: 8, height: 8, depth: 1) let tgCount = MTLSize(width: (width + 7) / 8, height: (height + 7) / 8, depth: 1) ce.dispatchThreadgroups(tgCount, threadsPerThreadgroup: threadsPerTG) ce.updateFence(fence) ce.endEncoding() // Blit texture into shared buffer let blit = cb.makeBlitCommandEncoder()! blit.waitForFence(fence) blit.copy( from: tex, sourceSlice: 0, sourceLevel: 0, sourceOrigin: MTLOrigin(x: 0, y: 0, z: 0), sourceSize: MTLSize(width: width, height: height, depth: 1), to: dstBuf, destinationOffset: 0, destinationBytesPerRow: rowStride, destinationBytesPerImage: totalBytes) blit.endEncoding() cb.commit() cb.waitUntilCompleted() assert(cb.error == nil, "GPU error: \(String(describing: cb.error))") // ── Verify a few texels let out = dstBuf.contents().bindMemory(to: UInt32.self, capacity: width * height) print("first three texels:", out[0], out[1], out[width]) // 0 1 64 assert(out[0] == 4 && out[1] == 4 && out[width] == 4) } Metal shader: #include <metal_stdlib> using namespace metal; kernel void addOne(texture2d<uint, access::write> tex [[texture(0)]], uint2 gid [[thread_position_in_grid]]) { tex.write(4, gid); }
1
0
133
May ’25
Distortion Artifacts on VisionOS When Rendering Opaque/Alpha Clipped Foliage in URP (Unity 6.0, Metal)
I'm running into a persistent visual issue while deploying a floral corridor scene to Apple Vision Pro using Unity 6.0 with URP and Metal. The issue only appears on the Vision Pro device — everything looks fine in the Unity Editor. Issue Description When the frame rate drops to around 60–70 FPS, noticeable distortion artifacts appear around the edges of foliage models. It seems like the background meshes (behind the plants) get warped and leak through the edges of the foliage. Although this is most visible around the leaves, even solid objects like standard URP wall or box models show distorted edges when the issue occurs. All the foliage uses Opaque or Alpha Clipping materials. Things I've Tried Changing the foliage materials to Transparent mode —distortion around edges disappears, but using Transparent for a large number of foliage assets is not ideal for performance or sorting complexity. Reducing the number of foliage objects — with only a few plants in the scene and the frame rate staying around 100 FPS, the distortion disappears. However, this isn’t a practical solution for a full environment. Possible Cause? I came across this note in the Unity documentation: "Ensure depth-buffer for each pixel is non-zero - on visionOS, the depth buffer is used for reprojection. To ensure visual effects like skyboxes and shaders are displayed beautifully, ensure that some value is written to the depth for each pixel." Could this be related to the issue? Is it possible that Alpha Clipping with low pixel coverage leads to some pixels not writing to the depth buffer, which then causes problems during Vision Pro’s reprojection or foveated rendering? However, even when I disable Alpha Clipping entirely, the distortion issue still persists, so it may not be solely caused by clipping itself. Project Setup Unity 6.0 (URP) Depth Texture: Enable Using Metal as the graphics backend Running on real Vision Pro hardware (not simulator) Any advice on how to avoid these distortion issues on Vision Pro would be greatly appreciated. Thanks!
1
0
143
Jul ’25
打开显示HUD图形后,应用崩溃
hi everyone, 我们发现了一个和Metal相关崩溃。应用中使用了Metal相关的接口,在进行性能测试时,打开了设置-开发者-显示HUD图形。运行应用后,正常展示HUD,但应用很快发生了崩溃,日志主要信息如下: Incident Identifier: 1F093635-2DB8-4B29-9DA5-488A6609277B CrashReporter Key: 233e54398e2a0266d95265cfb96c5a89eb3403fd Hardware Model: iPhone14,3 Process: waimai [16584] Path: /private/var/containers/Bundle/Application/CCCFC0AE-EFB8-4BD8-B674-ED089B776221/waimai.app/waimai Identifier: Version: 61488 (8.53.0) Code Type: ARM-64 Parent Process: ? [1] Date/Time: 2025-06-12 14:41:45.296 +0800 OS Version: iOS 18.0 (22A3354) Report Version: 104 Monitor Type: Mach Exception Exception Type: EXC_BAD_ACCESS (SIGBUS) Exception Codes: KERN_PROTECTION_FAILURE at 0x000000014fffae00 Crashed Thread: 57 Thread 57 Crashed: 0 libMTLHud.dylib esfm_GenerateTriangesForString + 408 1 libMTLHud.dylib esfm_GenerateTriangesForString + 92 2 libMTLHud.dylib Renderer::DrawText(char const*, int, unsigned int) + 204 3 libMTLHud.dylib Overlay::onPresent(id<CAMetalDrawable>) + 1656 4 libMTLHud.dylib CAMetalDrawable_present(void (*)(), objc_object*, objc_selector*) + 72 5 libMTLHud.dylib invocation function for block in void replaceMethod<void>(objc_class*, objc_selector*, void (*)(void (*)(), objc_object*, objc_selector*)) + 56 6 Metal __45-[_MTLCommandBuffer presentDrawable:options:]_block_invoke + 104 7 Metal MTLDispatchListApply + 52 8 Metal -[_MTLCommandBuffer didScheduleWithStartTime:endTime:error:] + 312 9 IOGPU IOGPUNotificationQueueDispatchAvailableCompletionNotifications + 136 10 IOGPU __IOGPUNotificationQueueSetDispatchQueue_block_invoke + 64 11 libdispatch.dylib _dispatch_client_callout4 + 20 12 libdispatch.dylib _dispatch_mach_msg_invoke + 464 13 libdispatch.dylib _dispatch_lane_serial_drain + 368 14 libdispatch.dylib _dispatch_mach_invoke + 456 15 libdispatch.dylib _dispatch_lane_serial_drain + 368 16 libdispatch.dylib _dispatch_lane_invoke + 432 17 libdispatch.dylib _dispatch_lane_serial_drain + 368 18 libdispatch.dylib _dispatch_lane_invoke + 380 19 libdispatch.dylib _dispatch_root_queue_drain_deferred_wlh + 288 20 libdispatch.dylib _dispatch_workloop_worker_thread + 540 21 libsystem_pthread.dylib _pthread_wqthread + 288 我们测试了几个不同的机型,只有iPhone 13 Pro Max会发生崩溃。 Q1:为什么会发生这个崩溃? Q2:相同的逻辑,为什么仅在iPhone 13 Pro Max机型上出现崩溃? 期待您的解答。
1
0
277
Jul ’25