Say Scripting with LLMs

How does one get the terminal to speak output?

There are various ways to achieve the goal with accessibility tools (e.g. screen readers like VoiceOver on macOS, or libraries like emacspeak). But suppose we want something simple…

Say

On macOS, the built-in text-to-speech (TTS) utility is say:

say hello

… says hello. There are a few simple flags to control the speech rate, voice, etc.

To print command results to the terminal while also calling say, we can use tee to duplicate the input:

date | tee /dev/tty | say

An alias like alias dates='date | tee /dev/tty | say' can be convenient, but it doesn’t take additional arguments to the original command. Alternatively, we can define multiple aliases as functions (e.g. as a script to call from ~/.zshrc).

#!/bin/zsh

typeset SPEECH_RATE=300
typeset -A saycommands
saycommands=(
    lss "ls -1"
    pwds "pwd"
    dates "date"
)

for key in ${(k)saycommands}; do
    command="${saycommands[$key]}"
    eval "
        function ${key} {
            ${command} \"\$@\" | tee /dev/tty | say -r \${SPEECH_RATE}
        }
    "
done

echo "Say-enabled commands: ${(k)saycommands}"

Say Mode? Asking an LLM

For a more general solution, maybe we want the terminal to always speak output. And for non-experts in zshell, maybe we want help from an LLM.

The LLMs tried are: (1) ChatGPT 4o in the macOS desktop app: (2) Mistral’s default model in the web UI. In both cases, the initial prompt is simple: “In MacOS ZSH, I want to intercept every shell output and also speak it aloud with say -r 300”.

Interestingly, ChatGPT didn’t generate working code even after a few iterations, while Mistral quickly got something working, albeit a bit buggy. This is Mistral’s initial working version, which speaks the output once but prints it three times: link to gist.

I didn’t spend too much time engineering the prompts, but toggling back and forth between ChatGPT and Mistral, I found that:

neither model could fully and independently resolve the triple-printing issue
both models are better at debugging than generating code
the differential diagnosis between models (i.e. comparing their outputs) is useful

Overall, the results were roughly on par with expectations. For a modest task, the LLMs can do a good draft, allowing the human programmer to focus on refining the results. (Also, the time-savings were meaningful in this instance, since I know very little about shell/zshell scripting). And although by no means a rigorous comparison, I preferred the speed and quality of Mistral.

This is the final, working version, which allows say mode to be toggled in the terminal with saymode on and saymode off and includes a prompt indicator. It’s not particularly elegant and lacks some obvious usability features, but it works as a proof of concept.

#!/bin/zsh

typeset -g SAYMODE_ENABLED=false
typeset -g LAST_COMMAND_OUTPUT=""
typeset -g DEFAULT_PROMPT="$PROMPT"

################################################################################


function speak_last_output() {
    if [[ $SAYMODE_ENABLED == true && -n "$LAST_COMMAND_OUTPUT" ]]; then
        echo "$LAST_COMMAND_OUTPUT    " | say -r 300
        LAST_COMMAND_OUTPUT=""
    fi
}

function capture_output() {
    LAST_COMMAND_OUTPUT=$(eval "$1" 2>&1)
    return ${PIPESTATUS[0]}
}

################################################################################

function preexec() {
    # Check if the command is not empty
    if [[ -n "$1" && $SAYMODE_ENABLED == true ]]; then
        capture_output "$1"
        # Prevent the original command from being executed
        return 1
    fi
}

preexec_functions+=(preexec)
precmd_functions+=(speak_last_output)

################################################################################

function saymode_on() {
    SAYMODE_ENABLED=true
    PROMPT="%K{green}SAYMODE%k $DEFAULT_PROMPT"
    echo "Say mode enabled."
}

# Define the vmode off command
function saymode_off() {
    PROMPT="$DEFAULT_PROMPT"
    echo "Say mode disabled."
    SAYMODE_ENABLED=false
}

function saymode() {
    if [[ "$1" == "on" ]]; then
        saymode_on
    elif [[ "$1" == "off" ]]; then
        saymode_off
    else
        echo "Usage: vmode {on|off}"
    fi
}

Can You Do Better?

Some usability improvements that come to mind:

output can be automatically Limited in length
commands can be whitelisted or blacklisted, since e.g. “vi” or “men” that start a new program or terminal paging mode are not compatible

What does a different model like Grok 3 think?

I want you to comment on this Zsh Voice mode program. What are some good usability or feature enhancements to consider?

Given a simple prompt like the above, Grok 3 yields an impressive answer.

It identifies and solves both the above points without any specific prompting
It provides several more suggestions and implements them with lucid explanations
It identifies several nuances and gotchas, providing clear explanations

Overall, it generates a more complex and feature-rich program (copy-paste of the interactive session). The revised script runs immediately without any modifications!

There are some oddities in the shell experience introduced by the more advanced functionality, which Grok 3 couldn’t fully debug. so in the end I reverted to the original, simpler script and asked it to implement only a subset of improvements.

Here is the revised, working script, verbatim from Grok 3 (except for the whitelist of commands).

# Global Variables
typeset -g SAYMODE_ENABLED=false              # Tracks whether Say Mode is enabled
typeset -g LAST_COMMAND_OUTPUT=""             # Stores the last command's output
typeset -g DEFAULT_PROMPT="$PROMPT"           # Stores the default prompt for restoration
typeset -gi SAYMODE_OUTPUT_LIMIT=${SAYMODE_OUTPUT_LIMIT:-500}  # Output limit in characters, default 100
typeset -g -A SAYMODE_COMMANDS=(
    ls 1 pwd 1 echo 1 date 1
    whoami 1 hostname 1 uname 1 which 1 where 1
    type 1 env 1 printenv 1 alias 1
    du 1 df 1 ll 1 la 1 lh 1 
)  # Whitelist of safe commands

# Function to speak the last command output
function speak_last_output() {
    if [[ $SAYMODE_ENABLED == true && -n "$LAST_COMMAND_OUTPUT" ]]; then
        local output_to_speak
        if [[ $SAYMODE_OUTPUT_LIMIT -gt 0 ]]; then
            output_to_speak="${LAST_COMMAND_OUTPUT:0:$SAYMODE_OUTPUT_LIMIT}"
        else
            output_to_speak="$LAST_COMMAND_OUTPUT"
        fi
        echo "$output_to_speak    " | say -r 300  # Preserve original rate of 300 wpm
        LAST_COMMAND_OUTPUT=""
    fi
}

# Function to capture command output
function capture_output() {
    LAST_COMMAND_OUTPUT=$(eval "$1" 2>&1)  # Capture stdout and stderr
    return ${PIPESTATUS[0]}  # Return the exit status of the command
}

# Pre-execution hook to process whitelisted commands
function preexec() {
    if [[ $SAYMODE_ENABLED == true && -n "$1" ]]; then
        local cmd_name=${1%%[[:space:]]*}  # Extract the first word of the command
        if [[ -n "${SAYMODE_COMMANDS[$cmd_name]}" ]]; then  # Check if command is in whitelist
            capture_output "$1"
            return 1  # Prevent original command execution
        fi
    fi
}

# Set up Zsh hooks
preexec_functions+=(preexec)      # Run preexec before each command
precmd_functions+=(speak_last_output)  # Run speak_last_output after each command

# Function to enable Say Mode
function saymode_on() {
    SAYMODE_ENABLED=true
    PROMPT="%K{green}SAYMODE%k $DEFAULT_PROMPT"  # Update prompt to indicate Say Mode
    echo "Say mode enabled."
}

# Function to disable Say Mode
function saymode_off() {
    PROMPT="$DEFAULT_PROMPT"  # Restore original prompt
    SAYMODE_ENABLED=false
    echo "Say mode disabled."
}

# Main Say Mode control function with toggle
function saymode() {
    if [[ "$1" == "on" ]]; then
        saymode_on
    elif [[ "$1" == "off" ]]; then
        saymode_off
    elif [[ -z "$1" ]]; then  # Toggle if no argument is provided
        if [[ $SAYMODE_ENABLED == true ]]; then
            saymode_off
        else
            saymode_on
        fi
    else
        echo "Usage: saymode {on|off}"  # Updated usage message
    fi
}