I read somewhere that the frames are returned in decode order instead of presentation order when using AVAssetReader. The documentation seems sparse on the subject. I have so far failed to find a video file where the frames are not returned in presentation order.
Can anyone confirm the frames are actually returned in decode order?
Explore the integration of media technologies within your app. Discuss working with audio, video, camera, and other media functionalities.
Selecting any option will automatically load the page
Post
Replies
Boosts
Views
Activity
I have a question regarding the behavior of AVAudioSession.sharedInstance().outputVolume.
Observed behavior:
When the app is in the foreground, I read audioSession.outputVolume (for example, 0.1).
The app is then moved to the background.
While the app is in the background, the user changes the system volume using the hardware buttons (for example, to 0.5).
When the app returns to the foreground, audioSession.outputVolume still reports the previous value (0.1).
From my testing, outputVolume only seems to update when the system volume is changed while the app is in the foreground. Volume changes made while the app is in the background are not reflected when the app returns to the foreground.
Questions:
According to Apple’s documentation for AVAudioSession.outputVolume:
“The systemwide output volume set by the user.”
https://developer.apple.com/documentation/avfaudio/avaudiosession/outputvolume
However, based on our testing on iOS 18.6.2 and iOS 18.1, the observed behavior seems to differ from this description.
Questions:
The documentation states that outputVolume represents the system-wide volume set by the user. In our testing, the value does not reflect volume changes made while the app is in the background and only updates when the app is in the foreground.Is this the expected behavior of AVAudioSession.outputVolume?
Is there any other recommended way in Swift to retrieve the current system volume that reflects user changes made both while the app is in the foreground and while it is in the background?
Any clarification on the intended behavior or recommended handling would be greatly appreciated.
In iOS 26 (Developer Beta), the AVCaptureMetadataOutputObjectsDelegate no longer receives callbacks when metadataOutput.metadataObjectTypes = [.face] is set. On earlier iOS versions the issue does not occur. Interestingly, face detection works if I set the sessionPreset to .medium, but not with .high — except on the iPhone 16 Pro Max, where it works regardless.
[Note: this issue was happening on a main testing device, and after testing the same code on other devices, this issue is only happening on 1 out of 4 devices]
We are successfully getting a MusicCatalogResourceResponse for every song ID where we make the MusicCatalogResourceRequest. We are able to display the song title, artist name, and album artwork for each Song in the response.
However - when we go to play the song, there are some songs that play, and several songs that do not play. For the songs that don't play, the console shows “Failed to prepareToPlay error=<MPMusicPlayerControllerErrorDomain.6 "Failed to prepare to play" {}>”
let musicPlayer = ApplicationMusicPlayer.shared
func playSong(_ song: Song) {
musicPlayer.queue = [song]
Task {
try await musicPlayer.prepareToPlay()
try await musicPlayer.play()
}
Is there anything else we can investigate about what may be causing specific song IDs not to play on this specific device?
Even if we remove line 6 musicPlayer.prepareToPlay() we still see the same console error when running playSong with the Songs that don't work.
It is always the same song IDs that we can play and always the same song IDs that we cannot get to play, even trying them across different projects with different bundle identifiers.
We can tap to play a song that works, and it starts playing immediately. Then tap a song that doesn't work, and nothing happens. Then back to a song that works. It's consistent which songs succeed and fail on this device.
Perhaps there is an issue specific to this very iPad when it comes to certain specific songs, but we'd like to be confident that an app relying on MusicKit will be able to play songs that have been successfully loaded with a MusicCatalogResourceResponse.
Thanks for any help or suggestions about what we may be able to investigate further on the device or what we should consider when launching an app that expects anyone with Apple Music to be able to listen to any of the songs loaded by the app.
Specific iPad details: iPad Pro (12.9-inch) (6th generation) running iPadOS 26.4 Beta
Two of the song IDs that won't play on this iPad (even though we can access and display their album artwork and all other information): 943204000 and 1441164805
Hello,
I am building an iOS-only, commercial app that uses AVSpeechSynthesizer with system voices, strictly using the APIs provided by Apple. Before distributing the app, I want to ensure that my current implementation does not conflict with the iOS Software License Agreement (SLA) and is aligned with Apple’s intended usage.
For a better playback experience (more accurate estimation of utterance duration and smoother skip forward/backward during playback), I currently synthesize speech using:
AVSpeechSynthesizer.write(_:toBufferCallback:)
Converting the received AVAudioPCMBuffer buffers into audio data
Storing the audio inside the app sandbox
Playing it back using AVAudioPlayer / AVAudioEngine
The cached audio is:
Generated fully on-device using system voices
Stored only inside the app’s private container
Used only for internal playback controls (timeline, seek, skip ±5 seconds)
Never shared, exported, uploaded, or exposed outside the app
The alternative approaches would be:
Keeping the generated audio entirely in memory (RAM) for playback purposes, without writing it to the file system at any point
Or using AVSpeechSynthesizer.speak(_:) and playing speech strictly in real time which has a poorer user experience compared to my approach
I have reviewed the current iOS Software License Agreement:
https://www.apple.com/legal/sla/docs/iOS18_iPadOS18.pdf
In particular, section (f) mentions restrictions around System Characters, Live Captions, and Personal Voice, including the following excerpt:
“…use … only for your personal, non-commercial use…
No other creation or use of the System Characters, Live Captions, or Personal Voice is permitted by this License, including but not limited to the use, reproduction, display, performance, recording, publishing or redistribution in a … commercial context.”
I do not see a specific reference in the SLA to system text-to-speech voices used via AVSpeechSynthesizer, and I want to be certain that temporarily caching synthesized speech for internal, non-exported playback is acceptable in a commercial app.
My question is:
Is caching AVSpeechSynthesizer system-voice output inside the app sandbox for internal playback acceptable, or is Apple’s recommended approach to rely only on real-time playback (speak(_:)) or strictly in-memory buffering without file storage?
If this question falls outside DTS technical scope and is instead a policy or licensing matter, I would appreciate guidance on the authoritative Apple documentation or the correct Apple team/contact.
Thank you.
Hey,
I have a camera app that captures a ProRaw photo and then runs a few Core Image filters before saving it to the device as a HEIC. However I'm finding that capturing at 48MP is rather slow. Testing a minimal pipeline on an iPhone 16 Pro:
Shutter press => file received in output: 1.2 ~ 1.6s
CIRawFilter created using photo file representation then rendered to context, without any filters: 0.8s ~ 1s
Saving to device ~0.15s
Is this the expected time for capturing processing? The native camera app seems to save the images within half a second. I'm using QualityPrioritization.balanced and the highest resolution available which is 48MP.
Would using the CIRawFilter with the pixelBuffer from the photo output be faster? I tried it but couldn't get it to output an image. Are there any other things I could try to speed this up? Is it possible to capture at 24MP instead?
Thanks,
Alex
I'm creating Live Photos programmatically in my app using the Photos and AVFoundation frameworks. While the Live Photos work perfectly in the Photos app (long press shows motion), users cannot set them as motion wallpapers. The system shows "Motion not available" message.
Here's my approach for creating Live Photos:
// 1. Create video with required metadata
let writer = try AVAssetWriter(outputURL: videoURL, fileType: .mov)
let contentIdentifier = AVMutableMetadataItem()
contentIdentifier.identifier = .quickTimeMetadataContentIdentifier
contentIdentifier.value = assetIdentifier as NSString
writer.metadata = [contentIdentifier]
// Video settings: 882x1920, H.264, 30fps, 2 seconds
// Added still-image-time metadata at middle frame
// 2. Create HEIC image with asset identifier
var makerAppleDict: [String: Any] = [:]
makerAppleDict["17"] = assetIdentifier // Required key for Live Photo
metadata[kCGImagePropertyMakerAppleDictionary as String] = makerAppleDict
// 3. Generate Live Photo
PHLivePhoto.request(
withResourceFileURLs: [photoURL, videoURL],
placeholderImage: nil,
targetSize: .zero,
contentMode: .aspectFit
) { livePhoto, info in
// Success - Live Photo created
}
// 4. Save to Photos library
PHAssetCreationRequest.forAsset().addResource(with: .photo, fileURL: photoURL, options: nil)
PHAssetCreationRequest.forAsset().addResource(with: .pairedVideo, fileURL: videoURL, options: nil)
What I've Tried
Matching exact video specifications from Camera app (882x1920, H.264, 30fps)
Adding all documented metadata (content identifier, still-image-time)
Testing various video durations (1.5s, 2s, 3s)
Different image formats (HEIC, JPEG)
Comparing with exiftool against working Live Photos
Expected Behavior
Live Photos created programmatically should be eligible for motion wallpapers, just like those from the Camera app.
Actual Behavior
System shows "Motion not available" and only allows setting as static wallpaper.
Any insights or workarounds would be greatly appreciated. This is affecting our users who want to use their created content as wallpapers.
Questions
Are there additional undocumented requirements for Live Photos to be wallpaper-eligible?
Is this a deliberate restriction for third-party apps, or a bug?
Has anyone successfully created Live Photos that work as motion wallpapers?
Environment
iOS 17.0 - 18.1
Xcode 16.0
Tested on iPhone 16 Pro
Topic:
Media Technologies
SubTopic:
Photos & Camera
Tags:
LivePhotosKit JS
PhotoKit
Core Image
AVFoundation
Hello,
We are trying to use the new Background Upload Extension to improve uploads of assets (Photos, Live Photos, Videos) in the background in our application.
1 - We implemented the code to upload still images.
We would like to do the same for Live Photos but we are not sure how to proceed since a Live Photo is composed of 2 resources that we would like to upload in the same job so that we keep the association between the 2
resources.
Steps:
A Live Photo is captured on the device.
Our application is notified for new content in the Photo Library and the asset is queued for upload by our application.
The system calls the background upload extension and the Live Photo is prepared for upload but we can schedule a job only for one resource (still photo or paired video) so we can not upload both resources in a single job.
2 - Is there a way to synchronise the application and the extension so that the application would not process the same data or execute the same requests than the extension. For example: our application is working with
tokens and we would like to prevent those tokens to be consumed by the application and the extension at the same
time.
3 - is there a command in xcode or terminal to start or stop the extension, something similar to what exists for processing tasks
(https://developer.apple.com/documentation/backgroundtasks/starting-and-terminating-tasks-during-development).
On macOS 26.2 (Tahoe), the Preview app fails to render many USDZ models correctly.
The issue appears inconsistent:
• Some USDZ files open normally in Preview
• Some USDZ files open but the viewport turns completely black (no geometry, no material, no lighting)
• All the same files open correctly when opened using “Open With → Xcode”
This was not the behavior on macOS 15.7.2, where the exact same USDZ files rendered consistently in Preview without failures.
Hi 👋! We have a SpriteKit-based app where we play AVAudio sounds in three different ways:
Effects (incl. UI sounds) with AVAudioPlayer.
Long looping tracks with AVAudioPlayer.
Short animation effects on the timeline of SpriteKit's SKScene files (effectively SKAudioNode nodes).
We've found that when you exit the app or otherwise interrupt audio plays, future audio plays often fail. For example, there's a WebKit-based video trailer inside the app, and if you play it, our looping background music track (2.) will stop playing, and won't resume as you close the trailer (return from WebKit). This is probably due to us not manually restarting the track (so may well be easily fixed). Periodically played AVAudioPlayer audio (1.) are not affected.
However, the more concerning thing is that the audio tracks on SKScene file timelines (3.) will no longer play. My hypothesis is that AVAudioEngine gets interrupted, and needs to be restarted for those AVAudioNode elements to regain functionality. Thing is, we don't deal with AVAudioEngine at all currently in the app, meaning it is never initiated to begin with.
Obviously things return to normal when you remove the app from short-term memory and restart it. However, it seems many of our users aren't doing this, and often report audio failing presumably due to some interruption in the past without the app ever being cleared from memory.
Any idea why timeline-run SKAudioNodes would fail like this? Should the app react to app backgrounding/foregrounding regarding audio?
Any help would be very much appreciated ✌️!
Is it possible to perform speech-to-text using AVAudioEngine to capture microphone input while being on a FaceTime call at the same time?
I tried implementing this, but whenever I attempt to start the AVAudioEngine while a FaceTime call is active, I get the following error:
“The operation couldn’t be completed. (OSStatus error 2003329396)”
I assume this might be due to microphone resource restrictions during FaceTime, but I’d like to confirm whether this limitation is at the system level or if there’s any possible workaround or entitlement that allows concurrent microphone access.
Has anyone encountered this issue or found a solution?
In the past, when using Lightning, many external devices had to go through MFi certification. However, since the iPhone 15 switched from Lightning to USB-C, is MFi certification still required?
Our company has developed several UVC devices, and we have confirmed that iPads can read frames from external cameras through the external device type in AVFoundation. However, this is not supported on iPhones.
We are currently exploring feasible ways to enable UVC device support on iPhones. Is MFi certification the only option? If so, is the MFi certification process for USB-C the same as it was for Lightning? Does it still require purchasing an MFi chip and manufacturing specially designed USB-C cables?
Hello,
I have been running into issues with setting nowPlayingInfo information, specifically updating information for CarPlay and the CPNowPlayingTemplate.
When I start playback for an item, I see lock screen information update as expected, along with the CarPlay now playing information.
However, the playing items are books with collections of tracks. When I select a new track(chapter) within the book, I set the MPMediaItemPropertyTitle to the new chapter name. This change is reflected correctly on the lock screen, but almost never appears correctly on the CarPlay CPNowPlayingTemplate. The previous chapter title remains set and never updates.
I see "Application exceeded audio metadata throttle limit." in the debug console fairly frequently.
From that a I figured that I need to minimize updates to the nowPlayingInfo dictionary. What I did:
I store the metadata dictionary in a local dictionary and only set values in the main nowPlayingInfo dictionary when they are different from the current value.
I kick off the nowPlayingInfo update via a task that initially sleeps for around 2 seconds (not a final value, just for my current testing). If a previous Task is active, it gets cancelled, so that only one update can happen within that time window.
Neither of these things have been sufficient. I can switch between different titles entirely and the information updates (including cover art).
But when I switch chapters within a title, the MPMediaItemPropertyTitle continues to get dropped. I know the value is getting set, because it updates on the lock screen correctly.
In total, I have 12 keys I update for info, though with the above changes, usually 2-4 of them actually get updated with high frequency.
I am running out of ideas to satisfy the throttling thresholds to accurately display metadata. I could use some advice.
Thanks.
Hello,
My company has an in-store app with FPS SDK 4.x (1024) keys. We've handed those keys over to a trusted third-party and we do not have them. We've been in-store for several years.
The person that created the keys in our organization mistakenly stored them encrypted to our third-party's PGP keys, so we cannot decrypt them, and the third party also has no mechanism to provide us with the keys even though it is in their runtime environment. They only have secure mechanisms for us to upload keys onto their servers.
We are trying to migrate to a different third-party DRM provider, and would like to obtain new keys. Unfortunately, the developer portal won't let me create new keys, saying that we have exceeded the number of keys allowed, which I assume is one.
Additionally, the new DRM provider can only support SDK 4.x keys, and it appears that we can only request SDK 5.x keys on the Apple Developer portal, as the SDK 4.0 option is grayed out. Regardless, it seems that we are not able to request any keys.
We've submitted a request to the support e-mail address and received an automated e-mail that the response should take a few days, but may take longer on occasion. It's now been a month. The e-mail says that the reply address is not monitored. Is there any way we can accelerate this?
Thank you,
Carlos
Hi there,
We're working on offline playback of DRM tracks. The persistent keys (also known as track licenses) for offline playback are stored locally on the device and are served from cache when a user initiates playback of a downloaded track.
Our persistent keys have a limited validity time and need to be refreshed when they expire. To prevent a situation where a persistent key expires while the user is offline, we've decided to eagerly refresh these keys one week before their expiration date. To make that happen we need to be able to obtain the expiration date of the given track license.
We've been attempting to use the makeSecureTokenForExpirationDateOfPersistableContentKey API to facilitate this process. The documentation states that this API returns a secret token representing the persistent key, which we can then exchange with our license server for the expiration date: https://developer.apple.com/documentation/avfoundation/avcontentkeysession/makesecuretokenforexpirationdate(ofpersistablecontentkey:completionhandler:)?language=objc
However, every time we call makeSecureTokenForExpirationDateOfPersistableContentKey, we receive an error with code -46250. We haven't been able to find any public references or documentation for this specific error code, which is preventing us from troubleshooting the issue. We are conducting our tests on a physical device, as the simulator does not support FairPlay playback. We don't use dual expiry approach.
Is our understanding of how to obtain the expiration timestamp correct? Are we using the makeSecureTokenForExpirationDateOfPersistableContentKey API as it was intended? What does the -46250 error code mean, and what steps should we take to fix our FairPlay implementation to make this work?
Thanks in advance for your assistance.
Hello,
I'm developing an app that displays a photo library using UICollectionView and PHCachingImageManager. I'd like to achieve a user experience similar to the native iOS Photos app, where low-quality images are shown quickly while scrolling, and higher-quality images are loaded for visible cells once scrolling stops.
I'm currently using the following approach:
While Scrolling: I'm using the UICollectionViewDataSourcePrefetching protocol. In the prefetchItemsAt method, I call startCachingImages with low-quality options to cache images in advance.
After Scrolling Stops: In the scrollViewDidEndDecelerating method, I intend to load high-quality images for the currently visible cells.
I have a few questions regarding this approach:
What is the best practice for managing both low-quality and high-quality images efficiently with PHCachingImageManager? Is it correct to call startCachingImages with fastFormat options and then call it again with highQualityFormat in scrollViewDidEndDecelerating?
How can I minimize the delay when a low-quality image is replaced by a high-quality one? Are there any additional strategies to help pre-load high-quality images more effectively?
Topic:
Media Technologies
SubTopic:
Photos & Camera
I made a CMIOExtension (a virtual camera) which generates its own output, for use in our in-house software testing. I wanted to make a video source with 29.97, 30, 59.94 and 60fps output.
To this end, I created a CMIOExtensionDeviceSource which creates a CMIOExtensionDevice with one CMIOExtensionStreamSource with various stream formats contained in [CMIOExtensionStreamFormat], including one with both maxFrameDuration and minFrameDuration = CMTimeMake(value: 1000, timescale: 30000) and another with both maxFrameDuration and minFrameDuration = CMTimeMake(value: 1001, timescale: 30000)
I've held off on the creation of the 59.94/60fps source for now until this problem is resolved.
my virtual camera works, it produces a signal, but when I examine its associated AVCaptureDevice in the debugger, I find
(lldb) po self.captureDevice?.formats[0].videoSupportedFrameRateRanges[0].maxFrameDuration
▿ Optional<CMTime>
▿ some : CMTime
- value : 1000000
- timescale : 30000000
▿ flags : CMTimeFlags
- rawValue : 1
- epoch : 0
I get the same value, 1000000/30000000, or exactly 30fps, for all the formats of my AVCaptureDevice.
Is there something I'm doing wrong, or do CMIOExtensionDevices always round the frame rates?
I can't force CoreMediaIO to produce frames at exactly my desired frame interval, but I'd like to ensure that the average frame rate is my desired rate. How can I do that? Frame emission is governed by a repeating DispatchSourceTimer with a repeat time specified in nanoseconds with the TimerFlags set to 'strict'.
I’ve been using Apple Music API for quite a while now and a recent change must have happened which is quite disruptive.
On many occasions, artists release singles from an album as part of promoting this album. For recent examples, Harry Styles released “Aperture” (a single) to promote his upcoming album “Kiss All The Time. Disco, Occasionally“. Similarly, Bruno Mars released “I Just Might”, a single from the upcoming album “The Romantic”.
Previously, those would return at the endpoint ”artists/{artist_id}/albums” with a “- Single” suffix. But it seems a recent change happens where they only appear as playable tracks inside the album.
This behavior is also evident in the Apple Music app itself. Those singles no longer appear under “Singles & EPs”. Instead, they would only be visible if the single becomes popular enough to be shown on “Top Songs“. Otherwise one would have to know to tap on the (future) album to discover if there are released singles.
Meanwhile Spotify’s API returns those as singles properly, just like Apple Music API used to.
This change must be recent but the question is if it’s intentional, and if so, how can the API be used from here on out to “extract” those singles and represent them?
Looking to implement to UI to tell the user to clean their lens in our app.
Implemented the KVO for the cameraLensSmudgeDetectionStatus but I'm having issues reliably triggering it in, both in our app and the main camera app. Tried to get inventive by putting tupperware over the lens, but I think the model driving this or the LiDAR sensor might be smart enough to detect there is something close to the lens.
Is there any way to trigger this change in a similar way we can trigger thermal changes in debug?
Thanks.
In my app I use AVAssetReaderTrackOutput to extract PCM audio from a user-provided video or audio file and display it as a waveform.
Recently a user reported that the waveform is not in sync with his video, and after receiving the video I noticed that the waveform is in fact double as long as the video duration, i.e. it shows the audio in slow-motion, so to speak.
Until now I was using
CMFormatDescription.audioStreamBasicDescription.mSampleRate
which for this particular user video returns 22'050. But in this case it seems that this value is wrong... because the audio file has two audio channels with different sample rates, as returned by
CMFormatDescription.audioFormatList.map({ $0.mASBD.mSampleRate })
The first channel has a sample rate of 44'100, the second one 22'050. If I use the first sample rate, the waveform is perfectly in sync with the video.
The problem is given by the fact that the ratio between the audio data length and the sample rate multiplied by the audio duration is 8, double the ratio for the first audio file (4). In the code below this ratio is given by
Double(length) / (sampleRate * asset.duration.seconds)
When commenting out the line with the sampleRate variable definition in the code below and uncommenting the following line, the ratios for both audio files are 4, which is the expected result. I would expect audioStreamBasicDescription to return the correct sample rate, i.e. the one used by AVAssetReaderTrackOutput, which (I think) somehow merges the stereo tracks. The documentation is sparse, and in particular it’s not documented whether the lower or higher sample rate is used; in this case, it seems like the higher one is used, but audioStreamBasicDescription for some reason returns the lower one.
Does anybody know why this is the case or how I should extract the sample rate of the produced PCM audio data? Should I always take the higher one?
I created FB19620455.
let openPanel = NSOpenPanel()
openPanel.allowedContentTypes = [.audiovisualContent]
openPanel.runModal()
let url = openPanel.urls[0]
let asset = AVURLAsset(url: url)
let assetTrack = asset.tracks(withMediaType: .audio)[0]
let assetReader = try! AVAssetReader(asset: asset)
let readerOutput = AVAssetReaderTrackOutput(track: assetTrack, outputSettings: [AVFormatIDKey: Int(kAudioFormatLinearPCM), AVLinearPCMBitDepthKey: 16, AVLinearPCMIsBigEndianKey: false, AVLinearPCMIsFloatKey: false, AVLinearPCMIsNonInterleaved: false])
readerOutput.alwaysCopiesSampleData = false
assetReader.add(readerOutput)
let formatDescriptions = assetTrack.formatDescriptions as! [CMFormatDescription]
let sampleRate = formatDescriptions[0].audioStreamBasicDescription!.mSampleRate
//let sampleRate = formatDescriptions[0].audioFormatList.map({ $0.mASBD.mSampleRate }).max()!
print(formatDescriptions[0].audioStreamBasicDescription!.mSampleRate)
print(formatDescriptions[0].audioFormatList.map({ $0.mASBD.mSampleRate }))
if !assetReader.startReading() {
preconditionFailure()
}
var length = 0
while assetReader.status == .reading {
guard let sampleBuffer = readerOutput.copyNextSampleBuffer(), let blockBuffer = sampleBuffer.dataBuffer else {
break
}
length += blockBuffer.dataLength
}
print(Double(length) / (sampleRate * asset.duration.seconds))