Skip to content

vidarh/shell-parser

Repository files navigation

Shell Parser

A compact Ruby parser for POSIX Shell Command Language syntax that produces a simple AST suitable for syntax highlighting and shell execution.

Goal

Remain compact and minimalist in design, while offering a reasonably complete parsing.

Features

  • Tokenization - Breaks shell commands into tokens with position tracking
  • Simple AST - Clean, easy-to-traverse abstract syntax tree
  • POSIX Compliance - Based on POSIX Shell Command Language specification
  • Quoting Support - Handles single quotes, double quotes, and backslash escaping
  • Expansions - Parses variable expansions ($VAR, ${VAR}) and command substitution ($(...), `...`)
  • Command Structures - Pipelines, lists (&&, ||, ;, &), and redirections
  • Compact - ~320 lines of clean, readable Ruby code

Installation

Add this line to your application's Gemfile:

gem 'shell_parser'

And then execute:

bundle install

Or install it yourself as:

gem install shell_parser

AST Node Types

Word

A word is composed of one or more parts:

Word = Struct.new(:parts, :pos, :len)
# parts: array of Literal, Variable, or CommandSub
# pos: character position in input
# len: total length

Word Parts

Literal - Plain text (possibly quoted):

Literal = Struct.new(:value, :pos, :len, :quote_style)
# value: the text content
# quote_style: :none, :single, or :double

Variable - Variable expansion:

Variable = Struct.new(:name, :pos, :len, :braced, :quote_style)
# name: variable name (e.g., "HOME" for $HOME)
# braced: true if ${VAR} form, false if $VAR
# quote_style: :none or :double (variables don't expand in single quotes)

CommandSub - Command substitution:

CommandSub = Struct.new(:command, :pos, :len, :style, :quote_style)
# command: the command text to execute
# style: :dollar for $(cmd) or :backtick for `cmd`
# quote_style: :none or :double

Command

A simple command with arguments and redirections:

Command = Struct.new(:words, :redirects)
# words: array of Word nodes
# redirects: array of Redirect nodes

Pipeline

Commands connected by pipes (|):

Pipeline = Struct.new(:commands, :negated)
# commands: array of Command nodes
# negated: boolean (for future ! pipeline support)

List

Commands connected by control operators:

List = Struct.new(:left, :op, :right)
# left/right: Command, Pipeline, or List
# op: :and (&&), :or (||), :semi (;), :background (&)

Redirect

I/O redirection:

Redirect = Struct.new(:type, :fd, :target)
# type: :in (<), :out (>), :append (>>), :heredoc (<<), etc.
# fd: file descriptor number (optional)
# target: Word node

Usage

Basic Parsing

require_relative 'shell_parser'

# Parse a command
ast = ShellParser.parse("ls -la /tmp")
# => Command with 3 words

# Parse a pipeline
ast = ShellParser.parse("cat file.txt | grep error | wc -l")
# => Pipeline with 3 commands

# Parse command lists
ast = ShellParser.parse("make && make test || echo failed")
# => List with nested lists

Syntax Highlighting

The parser provides detailed structure perfect for syntax highlighting:

ast = ShellParser.parse("echo $HOME > output.txt")

ast.words.each do |word|
  word.parts.each do |part|
    case part
    when ShellParser::Literal
      case part.quote_style
      when :single then highlight_single_quoted(part.value, part.pos, part.len)
      when :double then highlight_double_quoted(part.value, part.pos, part.len)
      else highlight_literal(part.value, part.pos, part.len)
      end
    when ShellParser::Variable
      highlight_variable(part.name, part.pos, part.len, part.braced)
    when ShellParser::CommandSub
      highlight_command_sub(part.command, part.pos, part.len, part.style)
    end
  end
end

ast.redirects.each do |redir|
  highlight_redirection(redir.type, redir.target)
end

The structured representation makes it easy to apply context-aware highlighting:

# "Hello $USER" is represented as:
word.parts #=> [
  Literal("Hello ", quote_style: :double),
  Variable("USER", quote_style: :double)
]

Shell Execution

The AST makes it easy to traverse and execute commands:

ast = ShellParser.parse("echo $HOME > output.txt")

# Expand words by processing their parts
def expand_word(word)
  word.parts.map do |part|
    case part
    when ShellParser::Literal
      part.value  # Use as-is
    when ShellParser::Variable
      ENV[part.name] || ""  # Look up variable
    when ShellParser::CommandSub
      `#{part.command}`.chomp  # Execute command
    end
  end.join
end

# Execute based on AST structure
case ast
when ShellParser::Command
  args = ast.words.map { |w| expand_word(w) }
  execute_command(args, ast.redirects)

when ShellParser::Pipeline
  setup_pipe do
    ast.commands.each do |cmd|
      args = cmd.words.map { |w| expand_word(w) }
      execute_in_pipeline(args, cmd.redirects)
    end
  end

when ShellParser::List
  result = execute(ast.left)
  case ast.op
  when :and then execute(ast.right) if result == 0
  when :or then execute(ast.right) if result != 0
  when :semi then execute(ast.right)
  when :background then fork { execute(ast.right) }
  end
end

The quote_style field tells you how to handle word splitting and glob expansion:

part.quote_style == :none    # Apply glob expansion and word splitting
part.quote_style == :single  # Use literal value, no expansion
part.quote_style == :double  # Expand variables/commands, but no glob/split

Supported Syntax

Simple Commands

ls -la /tmp
echo "hello world"

Pipelines

cat file.txt | grep pattern | wc -l

Command Lists

make && make test           # AND - execute if previous succeeds
make || echo "failed"       # OR - execute if previous fails
make ; make test            # Sequential - always execute both
sleep 10 &                  # Background job

Redirections

command < input.txt         # Input redirection
command > output.txt        # Output redirection
command >> output.txt       # Append
command 2>> error.log       # Redirect stderr

Quoting

echo 'single quotes preserve everything literally'
echo "double quotes allow $VAR expansion"
echo escaped\ space

Expansions

echo $HOME                  # Variable expansion
echo ${USER}                # Variable expansion (braced)
echo $(date)                # Command substitution
echo `whoami`               # Command substitution (backticks)

Examples

See examples.rb for complete working examples of:

  • Syntax highlighting with token positions
  • Execution plan generation from AST
  • Pretty-printing AST structures

Run examples:

ruby examples.rb

Design Goals

  1. Simplicity - Clean, understandable code without excessive abstraction
  2. Compactness - Core parser in ~320 lines
  3. Practicality - Focus on two main use cases:
    • Syntax highlighting (needs tokens with positions)
    • Shell execution (needs command structure)
  4. POSIX Foundation - Based on POSIX spec but simplified where practical

Limitations

This is a simplified parser focused on the core syntax. Not currently supported:

  • Compound commands (if, while, for, case, {...}, (...))
  • Function definitions
  • Arithmetic expansion $((...))
  • Parameter expansion modifiers ${var:-default}
  • Here-documents (parsed but not fully implemented)
  • Pattern matching and globbing
  • Reserved words as special tokens

These can be added incrementally as needed.

Architecture

Lexer (ShellParser::Lexer)

  • Scans input character by character
  • Handles quoting, escaping, and special characters
  • Produces token stream with position information
  • Preserves metadata for syntax highlighting

Parser (ShellParser::Parser)

  • Recursive descent parser
  • Consumes tokens to build AST
  • Handles operator precedence
  • Simple error reporting

AST (Struct-based nodes)

  • Lightweight node types using Ruby Structs
  • Easy to pattern match and traverse
  • Minimal memory overhead

References

rsh Integration Roadmap

rsh is a Ruby shell that currently uses an 80-line tokenizer for command parsing. The integration path with shell_parser:

  1. Add as dependency — Add gem 'shell_parser' to rsh's Gemfile and require 'shell_parser' in the main entry point.
  2. Replace tokenizer — Replace tokenize_command / parse_shell_command in command_parser.rb with ShellParser.parse, gaining proper AST-driven parsing for pipelines, lists, redirects, and quoting.
  3. AST-driven execution — Use the structured AST (Command, Pipeline, List) for execution instead of passing raw command strings to exec, enabling proper variable expansion, pipeline setup, and redirection handling within the Ruby process.

About

A compact Ruby parser for POSIX Shell Command Language syntax

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages