The downloader property must be a Callable of the same signature as the aria2c, curl_impersonate, and requests downloader functions. You can pass it these functions by importing, or a custom function of a matching signature.
Note: It will still override the chosen downloader and use a fallback one in the case of using aria2c downloader but the download uses the HTTP Range header.
Closes#70
From aria2c's changelog (2007-09-02):
```
Now *.aria2 contorol file is first saved to *.aria2__temp and if it is successful, then renamed to *.aria2.
This prevents *.aria2 file from being truncated or corrupted when file system becomes out of space.
```
It seems something went wrong in 1.37.0 resulting in these files sometimes not being renamed back to `.aria2` and then being left there for good. The fix for devine would be to simply detect `.aria2__temp` and delete them once all segments finish downloading. My only worry here is the root cause for why it has failed to rename. Did the download actually complete without error? According to aria2c's RPC, no errors occurred. There's no way to add support for Aria2(c) 1.37.0 without this sort of change as the files to seem to download correctly regardless of the file not being renamed and then deleted.
Fixes#71
This reduces the amount of connections being made by quite a bit for playlists that constantly change keys, or have new key data for every single segment (e.g., Pluto sometimes).
It also allows you to pass headers and cookies, while still also being able to supply a proxy.
Also simplifies calculation of wanted segment range when decrypting. Instead of storing the starting segment index number with the encryption_data variable, we just grab the first segment that isn't already merged.
Fixes#77
Setting data as a dictionary allows more places of code (including DASH, HLS, Services, etc) to get/set what they want by key instead of typically by index (list/tuple). Tuples or lists were typically in services because DASH and HLS stored needed data as a tuple and services did not want to interrupt or remove that data, even though it would be fine.
The decrypt() call just before it would have included the map data for us, as it was needed to decrypt. Therefore, it would not need to be added again when merge_discontinuity() is called. In some cases re-adding the map data can cause playback or final merge failure.
How the m3u8 parser handles/groups #EXT-X to segment objects means the #EXT-X-DISCONTINUITY (`discontinuity` property) is tied to whatever segment is below it's line. Therefore, there's never a scenario where we need to merge+decrypt and the first every segment of the for loop, as there's no segments before it.
This can happen from just slightly off-spec playlists (can't blame it) but also from the OnSegmentFilter filtering out all segments before the first EXT-X-DISCONTINUITY. Common to happen when filtering out bumpers/intros.
- It now downloads all segment files multi-threaded first before any decryption or merging operations (excluding init data, which will be downloaded in sequence/order after all the segments are downloaded)
- Once all segments are downloaded it then starts to go through and do any merging/decryption/init data stuff/e.t.c afterwards.
- Segments are no longer decrypted one by one. If segments use the same EXT-X-KEY data, then they will be merged together and then decrypted. This should see a noticeable speed increase for Widevine DRM.
HLS playlists where each segment is in an mp4 container seems to corrupt when the EXT-X-MAP is changed out, unless you first merge segments by discontinuity and then merge the merges via FFmpeg (which demuxes all the merged segment continuities and then concatanates them together, probably giving it new init data too).
Previously, all entities were decoded in Subtitle files because of a problem with SubtitleEdit and it's /ReverseRtlStartEnd option not being entity-aware.
It actually ends up reversing the `;` of `‏`, instead of the actual value of `‏`. Therefore, I decoded all entities before SubtitleEdit could have processed the Subtitle, but this has caused problems with more advanced formats like TTML and WebVTT as `<` would decode to `<` causing syntax errors, among other problematic characters.
According to the TTML and WebVTT spec, html entity encoding is allowed, and that makes sense or you wouldn't be able to use `<` etc. Any failure for players to show the decoded character would be a player problem and be out of scope with Devine.
This fixes some Subtitles having e.g., `&` instead of just `&`, but especially for special entities like `‏` which enables Right-to-Left mode on Hebrew and Arabic Subtitles.
This was originally done to prevent *all* aria2c logs unless on the last attempt, at which if it failed all attempts it would let aria2c log the error.
However, that's bad practice as aria2c may produce errors or warnings on say the 3rd attempt, and the 3rd attempt may have otherwise succeeded, with warnings or errors. It also generally shouldn't be necessary.
Ok, so there's a few reasons this was done.
1) Design-wise it isn't valid to have --proxy (or via config/otherwise) set a proxy, then unpredictably have it bypassed or disabled. If I specify `--proxy 127.0.0.1:8080`, I would expect it to use that proxy for all communication indefinitely, not switch in and out depending on the track or service.
2) With reason 1, it's also a security problem. The only reason I implemented it in the first place was so I could download faster on my home connection. This means I would authenticate and call APIs under a proxy, then suddenly download manifests and segments e.t.c under my home connection. A competent service could see that as an indicator of bad play and flag you.
3) Maintaining this setup across the codebase is extremely annoying, especially because of how proxies are setup/used by Requests in the Session. There's no way to tell a request session to temporarily disable the proxy and turn it back on later, without having to get the proxy from the session (in an annoying way) store it, then remove it, make the calls, then assuming your still in the same function you can add it back. If you're not in the same function, well, time for some spaghetti code.
---
tldr; -1 ux/design/expectations with CLI, -1 security aspect, -1 code maintenance, but only +1 for potentially increased download speeds in certain scenarios.
* Add option for automatic subtitle character encoding normalization
The rationale behind this function is that some services use ISO-8859-1
(latin1) or Windows-1252 (CP-1252) instead of UTF-8 encoding, whether
intentionally or accidentally. Some services even stream subtitles with
malformed/mixed encoding (each segment has a different encoding).
* Remove Subtitle parameter `auto_fix_encoding`
Just always attempt to fix encoding. If the subtitle is neither UTF-8 nor CP-1252, then it should realistically error out instead of producing garbage Subtitle data anyway.
* Move Subtitle encoding fixing code out of if drm tree
* Use chardet as a last ditch effort fixing Subs, or return original data
* Move Subtitle.fix_encoding method to utilities as try_ensure_utf8
* Add Shivelight as a contributor
---------
Co-authored-by: rlaphoenix <rlaphoenix@pm.me>
For aria2c I've simplified the operation by offloading most of the work for creating a cookie header by just re-doing what Python-requests does. This results in the exact same cookies Python-requests would have used in a requests.get() call or such. It supports multiple of the same-name cookies under different domains/paths based on the URI of the mock request.
This used to be used even before devine was public, but it was constantly changed back and forth between an urljoin(), another form of urljoin (something custom or something I can't remember), and an if check + addition.
However, I can confirm that a simple if check will not work as the Base URI might not even be in the same relative root. The if checks have also been inconsistent with some checking if it starts with http(s)://, and some checking if it does not have the base URI at the start of the string.
This if check method does not work as well as an urljoin() has the potential to. It also fixes some services as some HLS playlists would have the m3u8 URL on a completely different root, subdomain, or even domain, causing it to completely break when trying to download segments.