Major Updates to performance and usage

+ You can now pass the course url to the -c argument instead of the course id
+ Fixed function doc tags
+ Added experimental support for business accounts (the program should auto detect the subdomain from the course url and use it for all requests, you shouldn't need to edit the file)
- Removed the useless creation time metadata from ffmpeg
+ Updated arguments, ``course_url`` is now a required argument
- ``course_url`` will no longer be pulled from the .env file, you can still use the .env for bearer tokens
+ Courses are now downloaded into folders titled by the course name instead of the course id
+ Updated README to reflect updates
+ Fixed a bug where external url files would be appended to each time a the downloader is restarted on a course
This commit is contained in:
Puyodead1 2021-05-20 22:34:56 -04:00
parent 5dc9211fa0
commit 0782c42df7
2 changed files with 179 additions and 123 deletions

View File

@ -29,12 +29,12 @@ You will need to get a few things before you can use this program:
- Decryption Key ID - Decryption Key ID
- Decryption Key - Decryption Key
- Udemy Course ID - Udemy Course URL
- Udemy Bearer Token - Udemy Bearer Token
### Setting up ### Setting up
- rename `.env.sample` to `.env` - rename `.env.sample` to `.env` _(you only need to do this if you plan to use the .env file to store your bearer token)_
- rename `keyfile.example.json` to `keyfile.json` - rename `keyfile.example.json` to `keyfile.json`
### Aquire bearer token ### Aquire bearer token
@ -49,17 +49,9 @@ You will need to get a few things before you can use this program:
- ![bearer token example](https://i.imgur.com/FhQdwgD.png) - ![bearer token example](https://i.imgur.com/FhQdwgD.png)
- enter this in the `.env` file after `UDEMY_BEARER=` (you can also pass this as an argument, see advanced usage for more information) - enter this in the `.env` file after `UDEMY_BEARER=` (you can also pass this as an argument, see advanced usage for more information)
### Aquire Course ID
- Follow above before following this
- locate the request url field
- ![request url](https://i.imgur.com/EUIV3bk.png)
- copy the number after `/api-2.0/courses/` as seen highlighed in the above picture
- enter this in the `.env` file after `UDEMY_COURSE_ID=` (you can also pass this as an argument, see advanced usage for more information)
### Key ID and Key ### Key ID and Key
It is up to you to aquire the key and key id. It is up to you to aquire the key and key id. Please don't ask me for help acquiring these, decrypting DRM protected content can be considered piracy.
- Enter the key and key id in the `keyfile.json` - Enter the key and key id in the `keyfile.json`
- ![keyfile example](https://i.imgur.com/wLPsqOR.png) - ![keyfile example](https://i.imgur.com/wLPsqOR.png)
@ -72,17 +64,18 @@ You can now run `python main.py` to start downloading. The course will download
# Advanced Usage # Advanced Usage
``` ```
usage: main.py [-h] [-d] [-b BEARER_TOKEN] [-c COURSE_ID] [-q QUALITY] [-l LANG] [--skip-lectures] [--download-assets] [--download-captions] usage: main.py [-h] -c COURSE_URL [-b BEARER_TOKEN] [-d] [-q QUALITY] [-l LANG] [--skip-lectures] [--download-assets]
[--download-captions]
Udemy Downloader Udemy Downloader
optional arguments: optional arguments:
-h, --help show this help message and exit -h, --help show this help message and exit
-d, --debug Use test_data.json rather than fetch from the udemy api. -c COURSE_URL, --course-url COURSE_URL
The URL of the course to download
-b BEARER_TOKEN, --bearer BEARER_TOKEN -b BEARER_TOKEN, --bearer BEARER_TOKEN
The Bearer token to use The Bearer token to use
-c COURSE_ID, --course-id COURSE_ID -d, --debug Use test_data.json rather than fetch from the udemy api.
The ID of the course to download
-q QUALITY, --quality QUALITY -q QUALITY, --quality QUALITY
Download specific video quality. (144, 360, 480, 720, 1080) Download specific video quality. (144, 360, 480, 720, 1080)
-l LANG, --lang LANG The language to download for captions (Default is en) -l LANG, --lang LANG The language to download for captions (Default is en)
@ -92,33 +85,31 @@ optional arguments:
``` ```
- Passing a Bearer Token and Course ID as an argument - Passing a Bearer Token and Course ID as an argument
- `python main.py -b <Bearer Token> -c <Course ID>` - `python main.py -c <Course URL> -b <Bearer Token>`
- `python main.py -c https://www.udemy.com/courses/myawesomecourse -b <Bearer Token>`
- Download a specific quality - Download a specific quality
- `python main.py -q 720` - `python main.py -c <Course URL> -q 720`
- Download assets along with lectures - Download assets along with lectures
- `python main.py --download-assets` - `python main.py -c <Course URL> --download-assets`
- Download assets and specify a quality - Download assets and specify a quality
- `python main.py -q 360 --download-assets` - `python main.py -c <Course URL> -q 360 --download-assets`
- Download captions (Defaults to English) - Download captions (Defaults to English)
- `python main.py --download-captions` - `python main.py -c <Course URL> --download-captions`
- Download captions with specific language - Download captions with specific language
- `python main.py --download-captions -l en` - English subtitles - `python main.py -c <Course URL> --download-captions -l en` - English subtitles
- `python main.py --download-captions -l es` - Spanish subtitles - `python main.py -c <Course URL> --download-captions -l es` - Spanish subtitles
- `python main.py --download-captions -l it` - Italian subtitles - `python main.py -c <Course URL> --download-captions -l it` - Italian subtitles
- `python main.py --download-captions -l pl` - Polish Subtitles - `python main.py -c <Course URL> --download-captions -l pl` - Polish Subtitles
- `python main.py --download-captions -l all` - Downloads all subtitles - `python main.py -c <Course URL> --download-captions -l all` - Downloads all subtitles
- etc - etc
- Skip downloading lecture videos - Skip downloading lecture videos
- `python main.py --skip-lectures --download-captions` - Downloads only captions - `python main.py --skip-lectures --download-captions` - Downloads only captions
- `python main.py --skip-lectures --download-assets` - Downloads only assets - `python main.py --skip-lectures --download-assets` - Downloads only assets
# Getting an error about "Accepting the latest terms of service"?
- If you are using Udemy business, you must edit `main.py` and change `udemy.com` to `<portal name>.udemy.com`
# Credits # Credits
- https://github.com/Jayapraveen/Drm-Dash-stream-downloader - For the original code which this is based on - https://github.com/Jayapraveen/Drm-Dash-stream-downloader - For the original code which this is based on
- https://github.com/alastairmccormack/pywvpssh - For code related to PSSH extraction - https://github.com/alastairmccormack/pywvpssh - For code related to PSSH extraction
- https://github.com/alastairmccormack/pymp4parse/ - For code related to mp4 box parsing (used by pywvpssh) - https://github.com/alastairmccormack/pymp4parse - For code related to mp4 box parsing (used by pywvpssh)
- https://github.com/lbrayner/vtt-to-srt - For code related to converting subtitles from vtt to srt format - https://github.com/lbrayner/vtt-to-srt - For code related to converting subtitles from vtt to srt format
- https://github.com/r0oth3x49/udemy-dl - For some of the informaton related to using the udemy api

251
main.py
View File

@ -1,4 +1,4 @@
import os, requests, shutil, json, glob, urllib.request, argparse, sys import os, requests, shutil, json, glob, urllib.request, argparse, sys, re
from sanitize_filename import sanitize from sanitize_filename import sanitize
import urllib.request import urllib.request
from tqdm import tqdm from tqdm import tqdm
@ -11,19 +11,12 @@ from mpegdash.utils import (parse_attr_value, parse_child_nodes,
from utils import extract_kid from utils import extract_kid
from vtt_to_srt import convert from vtt_to_srt import convert
course_id = None
header_bearer = None
download_dir = "%s\out_dir" % os.getcwd() download_dir = "%s\out_dir" % os.getcwd()
working_dir = "%s\working_dir" % os.getcwd( working_dir = "%s\working_dir" % os.getcwd(
) # set the folder to download segments for DRM videos ) # set the folder to download segments for DRM videos
retry = 3 retry = 3
home_dir = os.getcwd() home_dir = os.getcwd()
keyfile_path = "%s\keyfile.json" % os.getcwd() keyfile_path = "%s\keyfile.json" % os.getcwd()
dl_assets = False
dl_captions = False
skip_lectures = False
caption_locale = "en"
quality = None # None will download the best possible
valid_qualities = [144, 360, 480, 720, 1080] valid_qualities = [144, 360, 480, 720, 1080]
if not os.path.exists(working_dir): if not os.path.exists(working_dir):
@ -36,12 +29,25 @@ if not os.path.exists(download_dir):
with open(keyfile_path, 'r') as keyfile: with open(keyfile_path, 'r') as keyfile:
keyfile = keyfile.read() keyfile = keyfile.read()
keyfile = json.loads(keyfile) keyfile = json.loads(keyfile)
"""
@author Jayapraveen
""" def extract_course_name(url):
"""
@author r0oth3x49
"""
obj = re.search(
r"(?i)(?://(?P<portal_name>.+?).udemy.com/(?:course(/draft)*/)?(?P<name_or_id>[a-zA-Z0-9_-]+))",
url,
)
if obj:
return obj.group("portal_name"), obj.group("name_or_id")
def durationtoseconds(period): def durationtoseconds(period):
"""
@author Jayapraveen
"""
#Duration format in PTxDxHxMxS #Duration format in PTxDxHxMxS
if (period[:2] == "PT"): if (period[:2] == "PT"):
period = period[2:] period = period[2:]
@ -110,12 +116,10 @@ def download_media(filename, url, lecture_working_dir, epoch=0):
download_media(filename, url, lecture_working_dir, epoch + 1) download_media(filename, url, lecture_working_dir, epoch + 1)
"""
@author Jayapraveen
"""
def cleanup(path): def cleanup(path):
"""
@author Jayapraveen
"""
leftover_files = glob.glob(path + '/*.mp4', recursive=True) leftover_files = glob.glob(path + '/*.mp4', recursive=True)
mpd_files = glob.glob(path + '/*.mpd', recursive=True) mpd_files = glob.glob(path + '/*.mpd', recursive=True)
leftover_files = leftover_files + mpd_files leftover_files = leftover_files + mpd_files
@ -126,25 +130,21 @@ def cleanup(path):
print(f"Error deleting file: {file_list}") print(f"Error deleting file: {file_list}")
"""
@author Jayapraveen
"""
def mux_process(video_title, lecture_working_dir, outfile): def mux_process(video_title, lecture_working_dir, outfile):
"""
@author Jayapraveen
"""
if os.name == "nt": if os.name == "nt":
command = f"ffmpeg -y -i \"{lecture_working_dir}\\decrypted_audio.mp4\" -i \"{lecture_working_dir}\\decrypted_video.mp4\" -acodec copy -vcodec copy -fflags +bitexact -map_metadata -1 -metadata title=\"{video_title}\" -metadata creation_time=2020-00-00T70:05:30.000000Z \"{outfile}\"" command = f"ffmpeg -y -i \"{lecture_working_dir}\\decrypted_audio.mp4\" -i \"{lecture_working_dir}\\decrypted_video.mp4\" -acodec copy -vcodec copy -fflags +bitexact -map_metadata -1 -metadata title=\"{video_title}\" \"{outfile}\""
else: else:
command = f"nice -n 7 ffmpeg -y -i \"{lecture_working_dir}\\decrypted_audio.mp4\" -i \"{lecture_working_dir}\\decrypted_video.mp4\" -acodec copy -vcodec copy -fflags +bitexact -map_metadata -1 -metadata title=\"{video_title}\" -metadata creation_time=2020-00-00T70:05:30.000000Z \"{outfile}\"" command = f"nice -n 7 ffmpeg -y -i \"{lecture_working_dir}\\decrypted_audio.mp4\" -i \"{lecture_working_dir}\\decrypted_video.mp4\" -acodec copy -vcodec copy -fflags +bitexact -map_metadata -1 -metadata title=\"{video_title}\" \"{outfile}\""
os.system(command) os.system(command)
"""
@author Jayapraveen
"""
def decrypt(kid, filename, lecture_working_dir): def decrypt(kid, filename, lecture_working_dir):
"""
@author Jayapraveen
"""
try: try:
key = keyfile[kid.lower()] key = keyfile[kid.lower()]
except KeyError as error: except KeyError as error:
@ -159,13 +159,11 @@ def decrypt(kid, filename, lecture_working_dir):
) )
"""
@author Jayapraveen
"""
def handle_irregular_segments(media_info, video_title, lecture_working_dir, def handle_irregular_segments(media_info, video_title, lecture_working_dir,
output_path): output_path):
"""
@author Jayapraveen
"""
no_segment, video_url, video_init, video_extension, no_segment, audio_url, audio_init, audio_extension = media_info no_segment, video_url, video_init, video_extension, no_segment, audio_url, audio_init, audio_extension = media_info
download_media("video_0.seg.mp4", video_init, lecture_working_dir) download_media("video_0.seg.mp4", video_init, lecture_working_dir)
video_kid = extract_kid(f"{lecture_working_dir}\\video_0.seg.mp4") video_kid = extract_kid(f"{lecture_working_dir}\\video_0.seg.mp4")
@ -211,12 +209,10 @@ def handle_irregular_segments(media_info, video_title, lecture_working_dir,
break break
"""
@author Jayapraveen
"""
def manifest_parser(mpd_url): def manifest_parser(mpd_url):
"""
@author Jayapraveen
"""
video = [] video = []
audio = [] audio = []
manifest = requests.get(mpd_url).text manifest = requests.get(mpd_url).text
@ -280,16 +276,9 @@ def manifest_parser(mpd_url):
return video + audio return video + audio
"""
@author Puyodead1
"""
def download(url, path, filename): def download(url, path, filename):
""" """
@param: url to download file @author Puyodead1
@param: path place to put the file
@oaram: filename used for progress bar
""" """
file_size = int(requests.head(url).headers["Content-Length"]) file_size = int(requests.head(url).headers["Content-Length"])
if os.path.exists(path): if os.path.exists(path):
@ -354,7 +343,8 @@ def process_caption(caption,
print(f"> Error converting captions: {e}") print(f"> Error converting captions: {e}")
def process_lecture(lecture, lecture_index, lecture_path, lecture_dir): def process_lecture(lecture, lecture_index, lecture_path, lecture_dir, quality,
skip_lectures, dl_assets, dl_captions, caption_locale):
lecture_title = lecture["title"] lecture_title = lecture["title"]
lecture_asset = lecture["asset"] lecture_asset = lecture["asset"]
if not skip_lectures: if not skip_lectures:
@ -409,6 +399,7 @@ def process_lecture(lecture, lecture_index, lecture_path, lecture_dir):
# process assets # process assets
if dl_assets: if dl_assets:
assets = [] assets = []
text_assets = ""
all_assets = lecture["supplementary_assets"] all_assets = lecture["supplementary_assets"]
for asset in all_assets: for asset in all_assets:
if asset["asset_type"] == "File": if asset["asset_type"] == "File":
@ -437,9 +428,16 @@ def process_lecture(lecture, lecture_index, lecture_path, lecture_dir):
assets.append(asset) assets.append(asset)
asset_path = f"%s\\%s. External URLs.txt" % (lecture_dir, asset_path = f"%s\\%s. External URLs.txt" % (lecture_dir,
lecture_index) lecture_index)
with open(asset_path, 'a') as f: # with open(asset_path, 'a') as f:
f.write(f"%s : %s\n" % # f.write(f"%s : %s\n" %
(asset["title"], asset["external_url"])) # (asset["title"], asset["external_url"]))
text_assets += f"%s: %s\n" % (asset["title"],
asset["external_url"])
if not text_assets == "":
with open(asset_path, 'w') as f:
f.write(text_assets)
print("> Found %s assets for lecture '%s'" % print("> Found %s assets for lecture '%s'" %
(len(assets), lecture_title)) (len(assets), lecture_title))
@ -470,8 +468,9 @@ def process_lecture(lecture, lecture_index, lecture_path, lecture_dir):
process_caption(caption, lecture_index, lecture_title, lecture_dir) process_caption(caption, lecture_index, lecture_title, lecture_dir)
def parse(data): def parse(data, course_id, course_name, skip_lectures, dl_assets, dl_captions,
course_dir = f"%s\\%s" % (download_dir, course_id) quality, caption_locale):
course_dir = f"%s\\%s" % (download_dir, course_name)
if not os.path.exists(course_dir): if not os.path.exists(course_dir):
os.mkdir(course_dir) os.mkdir(course_dir)
chapters = [] chapters = []
@ -491,7 +490,9 @@ def parse(data):
lecture_index = lectures.index(obj) + 1 lecture_index = lectures.index(obj) + 1
lecture_path = f"%s\\%s. %s.mp4" % (course_dir, lecture_index, lecture_path = f"%s\\%s. %s.mp4" % (course_dir, lecture_index,
sanitize(obj["title"])) sanitize(obj["title"]))
process_lecture(obj, lecture_index, lecture_path, download_dir) process_lecture(obj, lecture_index, lecture_path, download_dir,
quality, skip_lectures, dl_assets, dl_captions,
caption_locale)
for chapter in chapters: for chapter in chapters:
chapter_dir = f"%s\\%s. %s" % (course_dir, chapters.index(chapter) + 1, chapter_dir = f"%s\\%s. %s" % (course_dir, chapters.index(chapter) + 1,
@ -503,21 +504,56 @@ def parse(data):
lecture_index = chapter["lectures"].index(lecture) + 1 lecture_index = chapter["lectures"].index(lecture) + 1
lecture_path = f"%s\\%s. %s.mp4" % (chapter_dir, lecture_index, lecture_path = f"%s\\%s. %s.mp4" % (chapter_dir, lecture_index,
sanitize(lecture["title"])) sanitize(lecture["title"]))
process_lecture(lecture, lecture_index, lecture_path, chapter_dir) process_lecture(lecture, lecture_index, lecture_path, chapter_dir,
quality, skip_lectures, dl_assets, dl_captions,
caption_locale)
print("\n\n\n\n\n\n\n\n=====================") print("\n\n\n\n\n\n\n\n=====================")
print("All downloads completed for course!") print("All downloads completed for course!")
print("=====================") print("=====================")
def fetch_subscribed_courses_json(bearer_token, portal_name):
res = requests.get(
"https://{portal_name}.udemy.com/api-2.0/users/me/subscribed-courses?fields[course]=id,url,title,published_title&ordering=-last_accessed,-access_time&page=1&page_size=10000"
.format(portal_name=portal_name),
headers={
"Authorization":
bearer_token,
"x-udemy-authorization":
bearer_token,
"Host":
"{portal_name}.udemy.com".format(portal_name=portal_name),
"Referer":
"https://{portal_name}.udemy.com/home/my-courses/search/?q={course_name}"
.format(portal_name=portal_name, course_name=course_name)
})
res.raise_for_status()
data = res.json()
return data
def fetch_course_json(course_id, bearer_token, portal_name, course_name):
res = requests.get(
"https://{portal_name}.udemy.com/api-2.0/courses/{course_id}/cached-subscriber-curriculum-items?fields[asset]=results,title,external_url,time_estimation,download_urls,slide_urls,filename,asset_type,captions,media_license_token,course_is_drmed,media_sources,stream_urls,body&fields[chapter]=object_index,title,sort_order&fields[lecture]=id,title,object_index,asset,supplementary_assets,view_html&page_size=10000"
.format(portal_name=portal_name, course_id=course_id),
headers={
"Authorization": bearer_token,
"x-udemy-authorization": bearer_token,
"Host": "{portal_name}.udemy.com".format(portal_name=portal_name),
"Referer": "https://{portal_name}.udemy.com/"
})
res.raise_for_status()
return res.json()
if __name__ == "__main__": if __name__ == "__main__":
parser = argparse.ArgumentParser(description='Udemy Downloader') parser = argparse.ArgumentParser(description='Udemy Downloader')
parser.add_argument( parser.add_argument("-c",
"-d", "--course-url",
"--debug", dest="course_url",
dest="debug", type=str,
action="store_true", help="The URL of the course to download",
help="Use test_data.json rather than fetch from the udemy api.", required=True)
)
parser.add_argument( parser.add_argument(
"-b", "-b",
"--bearer", "--bearer",
@ -526,11 +562,11 @@ if __name__ == "__main__":
help="The Bearer token to use", help="The Bearer token to use",
) )
parser.add_argument( parser.add_argument(
"-c", "-d",
"--course-id", "--debug",
dest="course_id", dest="debug",
type=str, action="store_true",
help="The ID of the course to download", help="Use test_data.json rather than fetch from the udemy api.",
) )
parser.add_argument( parser.add_argument(
"-q", "-q",
@ -565,6 +601,15 @@ if __name__ == "__main__":
help="If specified, captions will be downloaded.", help="If specified, captions will be downloaded.",
) )
dl_assets = False
skip_lectures = False
dl_captions = False
caption_locale = "en"
quality = None
bearer_token = None
portal_name = None
course_name = None
args = parser.parse_args() args = parser.parse_args()
if args.download_assets: if args.download_assets:
dl_assets = True dl_assets = True
@ -583,42 +628,62 @@ if __name__ == "__main__":
load_dotenv() load_dotenv()
if args.bearer_token: if args.bearer_token:
header_bearer = f"Bearer %s" % args.bearer_token bearer_token = f"Bearer %s" % args.bearer_token
else: else:
header_bearer = f"Bearer %s" % os.getenv("UDEMY_BEARER") bearer_token = f"Bearer %s" % os.getenv("UDEMY_BEARER")
if args.course_id:
course_id = args.course_id
else:
course_id = os.getenv("UDEMY_COURSE_ID")
if not course_id: if args.course_url:
print("> Missing Course ID!") portal_name, course_name = extract_course_name(args.course_url)
if not course_name:
print("> Unable to extract course name from URL!")
sys.exit(1) sys.exit(1)
if not header_bearer: if not portal_name:
print("> Unable to extract portal name from URL!")
sys.exit(1)
if not bearer_token:
print("> Missing Bearer Token!") print("> Missing Bearer Token!")
sys.exit(1) sys.exit(1)
print(f"> Using course ID {course_id}") print(f"> Fetching subscribed course data...")
try:
subscribed_courses = fetch_subscribed_courses_json(
bearer_token, portal_name)
except Exception as e:
print("> Failed to fetch subscribed course information: %s" % e)
course = next((x for x in subscribed_courses["results"]
if x["published_title"] == course_name), None)
if not course:
print("> Failed to find course in course list!")
sys.exit(1)
course_id = course["id"]
course_title = course["title"]
print(
f"> Fetching information for course '%s', this might take a minute..."
% course_name)
try:
course_data = fetch_course_json(course_id, bearer_token, portal_name,
course_name)
except Exception as e:
print("> Failed to fetch course information: %s" % e)
sys.exit(1)
if not course_data:
print("> Failed to fetch course data!")
print("> Course information retrieved!")
if args.debug: if args.debug:
# this is for development purposes so we dont need to make tons of requests when testing # this is for development purposes so we dont need to make tons of requests when testing
# course data json is just stored and read from a file # course data json is just stored and read from a file
with open("test_data.json", encoding="utf8") as f: with open("test_data.json", encoding="utf8") as f:
data = json.loads(f.read())["results"] course_data = json.loads(f.read())
parse(data) parse(course_data["results"], course_id, course_name,
skip_lectures, dl_assets, dl_captions, quality,
caption_locale)
else: else:
print("Fetching Course data, this may take a minute...") parse(course_data["results"], course_id, course_name, skip_lectures,
r = requests.get( dl_assets, dl_captions, quality, caption_locale)
f"https://udemy.com/api-2.0/courses/{course_id}/cached-subscriber-curriculum-items?fields[asset]=results,title,external_url,time_estimation,download_urls,slide_urls,filename,asset_type,captions,media_license_token,course_is_drmed,media_sources,stream_urls,body&fields[chapter]=object_index,title,sort_order&fields[lecture]=id,title,object_index,asset,supplementary_assets,view_html&page_size=10000"
.format(course_id),
headers={
"Authorization": header_bearer,
"x-udemy-authorization": header_bearer
})
if r.status_code == 200:
print("Course data retrieved!")
data = r.json()
parse(data["results"])
else:
print("An error occurred while trying to fetch the course data! " +
r.text)