6
\$\begingroup\$

Why I made it

This small project combines the lessons I learnt about ASCII animations in this other project I did for an ASCII art animated RickRoll and the use of beautifulsoup in another project for an ASCII-based basic web browser.

How it works

Here are each step it takes to go from a search query input to an ASCII animation:

  1. It first asks the user for a search query.
  2. Then, it sends a GET request to yandex searching for the query under the videos section,
  3. and uses beautifulsoup to find all the .mp4 video preview URLs.
  4. With the urls now in a list, it randomly selects one of them and downloads the video from it.
  5. It uses ffmpeg to get the first frame of the video and turns it into ASCII art for the user to see.
  6. It then checks how much time it took to do step 5 and adds that to the current time for a 1:1 timing,
  7. and repeats step 5 but with the updated time from how long it took to get the preceding frame until it reaches 10 seconds, which is the length of Yandex video previews.

Requirements

You need to have ffmpeg installed before running the script. Simply enter this: sudo apt install ffmpeg

Code

Here is the python code:

import os
import re
import sys
import time
import random
import shutil
import requests
from ascii_magic import AsciiArt
from bs4 import BeautifulSoup


# -------------------- Dependency checks --------------------

def ensure_ffmpeg_installed():
    """
    Ensure ffmpeg is available on the system.
    Exit cleanly if it is missing.
    """
    if shutil.which("ffmpeg") is None:
        print("ffmpeg is not installed.")
        print("Install it with:")
        print("  sudo apt install ffmpeg")
        sys.exit(1)


# -------------------- Network / parsing --------------------

def get_preview_from_yandex(url):
    """
    Fetch a Yandex video search page and extract preview MP4 URLs.

    :param url: Yandex video search URL
    :return: One randomly selected preview video URL
    """
    response = requests.get(url, timeout=10)
    response.raise_for_status()

    soup = BeautifulSoup(response.text, "html.parser")

    previews = set(
        re.findall(r"https://[^\"']+\.mp4", soup.prettify())
    )

    if not previews:
        print("No preview videos found.")
        sys.exit(1)

    return random.choice(list(previews))


# -------------------- Media handling --------------------

def extract_frame(video_url, timestamp, output_path):
    """
    Extract a single frame from a remote video using ffmpeg.
    """
    command = (
        f'ffmpeg -loglevel error -ss {timestamp:.3f} '
        f'-i "{video_url}" -frames:v 1 "{output_path}" -y'
    )

    if os.system(command) != 0:
        print("ffmpeg failed to extract frame.")
        sys.exit(1)


def render_ascii(image_path):
    """
    Convert an image to ASCII and render it to the terminal.
    """
    art = AsciiArt.from_image(image_path)
    art.to_terminal(
        columns=os.get_terminal_size().columns,
        width_ratio=2,
        monochrome=False
    )


# -------------------- Playback loop --------------------

def play_preview(video_url):
    """
    Render frames from the video at a fixed frequency.
    """
    currently_at = 0.0  # seconds into the video

    while currently_at < 10.0:
        start_time = time.perf_counter()

        extract_frame(video_url, currently_at, "data.jpg")

        try:
            render_ascii("data.jpg")
        except OSError as e:
            print(f"Image rendering failed: {e}")
            sys.exit(1)
        finally:
            if os.path.exists("data.jpg"):
                os.remove("data.jpg")

        elapsed = time.perf_counter() - start_time

        currently_at += elapsed


# -------------------- Main entry --------------------

def main():
    ensure_ffmpeg_installed()
    search_query = input("Enter Yandex search query: ")
    search_url = f"https://yandex.com/video/search?text={search_query}"

    video_url = get_preview_from_yandex(search_url)
    play_preview(video_url)


if __name__ == "__main__":
    main()

Preview

Result of entering the search query "Minecraft"

Please tell me everything that comes to mind!

\$\endgroup\$
1
  • \$\begingroup\$ I would edit the code for this, because no answer has yet gone over this, but I feel it is more appropriate to simply say it as a comment: I just found out that downloading the video from the url first and then taking each frame from it instead of getting the frames directly from the url (like it currently does) makes it up to roughly 20 times faster than this current script. If you want to add this to your answers, be my guest. \$\endgroup\$ Commented 12 hours ago

5 Answers 5

3
\$\begingroup\$

Good job overall. The code is neatly split in dedicated functions and function names are descriptive. The code is structured and easy to follow.

Logging

I believe it would be appropriate the logging module instead of printing to console. Because:

  • flexibility: output to console and to a log file, or other destinations
  • ability to control verbosity depending on level
  • the console buffer size is limited, you could miss some messages

This is even more important if the script is going to run unattended.

As already mentioned, the installation instructions for ffmpeg are for Linux only, and it's only for Debian-like systems. If you have added these instructions, maybe it is with the intention to distribute your app (eg. as a Github repo), then consider enhancing these instructions to cover more operating systems/flavors.

You could move the instructions to a separate README file, along with instructions for Python where appropriate. You have three dependencies as I can see, so there would be a requirements file, unless you use another mechanism like Poetry.

Better shell

You can improve the shell invocation by retrieving the output of the command, in addition to the status code.

Example:

import subprocess
command = "ls"
status = subprocess.run(command, shell=False, capture_output=True, text=True)

Then check the value of status.returncode. If different than 0, then you can log status.stdout and status.stderr for troubleshooting purposes, and also the full command that was attempted. This will make troubleshooting immediately easier.

From the docs:

If capture_output is true, stdout and stderr will be captured.

A timeout may be specified in seconds

So you can terminate the process after a preset delay, so that your script does not remain stuck forever on a problematic command, or an unexpected prompt.

Security

Careful with untrusted input. An attacker could perform command injection by tweaking file names. This is not necessarily trivial to achieve in this case with URLs, but the risk exists at least in theory.

\$\endgroup\$
2
\$\begingroup\$

User Interface

I entered a query for which there were no previews available. A suitable message was displayed but then the program exited. I think it would more user-friendly if in that situation instead of exiting you prompted the user for another query. So that the user has a mechanism to exit the program if a suitable query cannot be entered, you might exit if the query is just the empty string or some other. If the preview is available but for some reason a single frame cannot be extracted or cannot be previewed, again you exit. Instead you might ask for a new search query.

If ffmpeg is not installed you then provide instructions for installing it that are for Linux. If that is the only platform that this application is intended for, then that is fine. But would it be useful to provide different instructions according to the platform?

data.jpg?

If this application is meant to be used by others, you do not want to be clobbering their own data.jpg file if they happen to have one in the current working directory. You should instead use a temporary file for this.

Code Structure

You have decomposed the logic into well-defined function each of which does a single thing and is described by docstring. This makes the code easily maintained and extended. You should add a docstring for the entire module explaining what the script does.

You have an if __name__ == "__main__": guard allowing the module to be imported without automatically executing. If you envision that the module might be imported, then consider renaming all functions except for main with a leading underscore. This tells the importer that those functions should be considered "private". You might then wish to rename main to something more application-specific, e.g. run_preview_animation.

Overall Impression

Nice job!

\$\endgroup\$
2
\$\begingroup\$

pathlib

Prefer modern concise usage, over the ancient clunky os.path API.

ASCII art

Don't.

# -------------------- Dependency checks --------------------

def ensure_ffmpeg_installed():

The -------- isn't helping us. And a maintainer will find it a pain to re-center the text after adding a word or two in a couple of months.

The comment is redundant with the (well chosen) identifier.

Organizing the source code is a laudable goal. Please use the mechanism python provides for that purpose: modules. Place the function within dependency_check.py or similar.

Further down we find that network / parsing is redundant with get preview from yandex, playback loop is redundant with play preview, and main entry is redundant with main. These are all 1:1 relationships.

The only "grouping" in the OP is for media handling. Consider putting extract_frame() and render_ascii() within media_handling.py

docstrings

On the whole, these are great. Each function has one, and they do a good job of telling the caller about what will happen.

    """
    Fetch a Yandex video search page and extract preview MP4 URLs.

    :param url: Yandex video search URL
    :return: One randomly selected preview video URL
    """

Here we're using what looks like a Numpy style of docstring, which can be quite helpful, supplying lots of details. But here, you might reconsider, preferring other ways of supplying details.

We accept, and return, an URL. The formal parameter url is accurate, but a little on the vague side. Consider naming it video_search_url, to distinguish it from the result URL.

The narrative mentions "... and extract preview MP4 URLs" (plural). The ":return:" text seems more precise, and more accurately describes the behavior. Consider improving the narrative text, so we don't even need a ":return:" section.

type hint

I don't really need to be told url: str, but in this function some annotation would help the caller:

def extract_frame(video_url, timestamp, output_path):
    """
    Extract a single frame from a remote video using ffmpeg.
    """

When I read timestamp I think str or datetime, but then {timestamp:.3f} says float. So maybe ... , seconds_offset: float, ... in the signature? Also, consulting the "Time duration" section of the man page reveals that 1.25 might be spelled "1.25s", "1250ms", or "1250000us", before we even get into "00:01.25". So this app is using a simplified subset, which is perfectly fine. But now we no longer get to rely on ffmpeg's docs, and have to shoulder that documentation burden ourselves.

if os.system(command) ...
This is OK as far as it goes, but please bear in mind that for a long time the docs have encouraged callers to prefer the modern subprocess module.

... using that module is recommended to using this function.

Even if the current usage does what's wanted on all platforms, it's common enough that a maintainer will want to tweak some aspect later. Starting out with e.g. check_call() can put us on a more solid and well tested basis for maintenance activities some months down the road.

DRY

play_preview() mentions "data.jpg" three times. Prefer to assign it to a variable.

And while you're at it, assign to a Path variable. That lets you use concise .exists() and .unlink() calls.

Consider using "/tmp/data.jpg", so there's no chance of leaving an errant temp file lying around in a source repo where you might accidentally commit it. The OP code puts it wherever ${CWD} happens to point.

magic number

    while currently_at < 10.0:

A convenient place to put that is up in the signature:

def play_preview(video_url: str, duration: float = 10.0) -> None:

roundoff

In a loop which adds, say, a constant 0.1 to a number, we can get cumulative rounding error that leads to figures like 0.3 + ε, 0.8 - ε, and so on.

        currently_at += elapsed

Consider making just a single assignment to start_time, outside the while loop, and then relying on total elapsed rendering time rather than time to render previous frame.

CLI

    search_query = input("Enter Yandex search query: ")

You might prefer to do this only if no command line search argument was supplied; import typer can be a very convenient way to implement that.

\$\endgroup\$
1
\$\begingroup\$

Two things to start off with

In the explanation, you mention

and repeats step 5 with the current time until it reaches 10 seconds, which is the length of Yandex video previews.

But that information is not reflected in the code except for as a mysterious 10.0. Would be better to have that as a variable max_preview_length = 10.0 and then compare to that. That way, a reader will know why it's that value without reading your explanation, as well as clearly have a variable to change in the future if that duration ever changes.

Also,

def ensure_ffmpeg_installed():
    """
    Ensure ffmpeg is available on the system.
    Exit cleanly if it is missing.
    """
    if shutil.which("ffmpeg") is None:
        print("ffmpeg is not installed.")
        print("Install it with:")
        print("  sudo apt install ffmpeg")
        sys.exit(1)

This is a good check to have, but suggesting sudo apt install ffmpeg restricts this advice to a subset of Linux distributions. It's not exactly terrible to have, but be aware of that.

\$\endgroup\$
0
\$\begingroup\$

DRY

In the play_preview function, this string is used several times: "data.jpg". You can set it to a constant, then use the constant everywhere:

FILE_NAME = "data.jpg"
while currently_at < 10.0:
    start_time = time.perf_counter()

    extract_frame(video_url, currently_at, FILE_NAME)

    try:
        render_ascii(FILE_NAME)
    except OSError as e:
        print(f"Image rendering failed: {e}")
        sys.exit(1)
    finally:
        if os.path.exists(FILE_NAME):
            os.remove(FILE_NAME)
\$\endgroup\$

You must log in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.