sqlite_parser/debug.py annotated source

Back to index

        

Debugging Utilities

This module provides comprehensive debugging tools for inspecting parser internals. These utilities are invaluable when developing new parser features or diagnosing issues.

Debugging Capabilities

  1. Token Stream Visualization: Pretty-print tokens with optional highlighting
  2. AST Formatting: Hierarchical display of AST node trees
  3. Parser Tracing: Log parser method calls and decisions
  4. State Inspection: View parser state at any point
  5. Context Managers: Temporarily enable debugging
  6. High-Level Debug Parse: One-function debugging workflow

Usage Patterns

Quick Debugging

```python from sqlite_parser.debug import debug_parse

Parse with full tracing

statements = debug_parse("SELECT * FROM users", verbose=True) ```

Detailed Token Inspection

```python from sqlite_parser import tokenize_sql from sqlite_parser.debug import print_tokens

tokens = tokenize_sql("SELECT id FROM users") print_tokens(tokens, highlight_pos=2) # Highlight token at position 2 ```

AST Inspection

```python from sqlite_parser import parse_sql from sqlite_parser.debug import print_ast

ast = parse_sql("SELECT * FROM users") print_ast(ast) # Pretty-print entire AST tree ```

Parser State Tracking

```python from sqlite_parser.debug import parser_debug_context

with parser_debug_context(parser): result = parser.parse() state = parser.get_state() print_state(state) ```

56
57"""
58Debug utilities for SQLite parser
59
60Provides formatting and inspection tools for debugging the parser,
61including token stream visualization, AST formatting, and parser tracing.
62"""
63
64from typing import List, Any
65from contextlib import contextmanager
66from .lexer import Token
67from .ast_nodes import ASTNode
68from .parser import Parser
69

Token Stream Formatter

format_token_stream() creates a tabular display of all tokens with their types and values. This is crucial for understanding how the lexer tokenized the input.

Features

  • Indexed: Each token shows its position in the stream
  • Type Display: Token type name (SELECT, IDENTIFIER, NUMBER, etc.)
  • Value Display: The actual text value
  • Highlighting: Optional >>> marker for a specific token position

Output Example

```

Token Stream

[ 0] SELECT 'SELECT'

[ 1] STAR '*' [ 2] FROM 'FROM' [ 3] IDENTIFIER 'users' ====================================================================== ```

The highlighting helps visualize parser position during debugging.

96
97def format_token_stream(tokens: List[Token], highlight_pos: int = None) -> str:
98    """
99    Format token stream for pretty printing
100
101    Args:
102        tokens: List of tokens to format
103        highlight_pos: Optional position to highlight
104
105    Returns:
106        Formatted string representation
107    """
108    lines = []
109    lines.append("=" * 70)
110    lines.append("Token Stream")
111    lines.append("=" * 70)
112
113    for i, token in enumerate(tokens):
114        marker = ">>>" if i == highlight_pos else "   "
115        type_str = f"{token.type.name:15}"
116        value_str = f"{repr(token.value):20}"
117        lines.append(f"{marker} [{i:3d}] {type_str} {value_str}")
118
119    lines.append("=" * 70)
120    return "\n".join(lines)
121

AST Tree Formatter

format_ast() recursively formats AST node trees with proper indentation to show the hierarchical structure. This is essential for understanding what the parser built.

Formatting Rules

  • Nodes: Show type name and attributes (excluding span for clarity)
  • Lists: Format as [...] with indented items
  • Primitives: Show with repr() for clarity (strings show quotes)
  • None: Displayed explicitly
  • Nesting: Each level indents 2 spaces

Example Output

SelectStatement( select_core= SelectCore( columns= [ ResultColumn( expression= Identifier( name='id' ) ) ] from_clause= FromClause( source= TableReference( name=QualifiedIdentifier( parts=['users'] ) ) ) ) )

This visualization makes it easy to verify the parser built the correct structure.

164
165def format_ast(node: Any, indent: int = 0) -> str:
166    """
167    Format AST node tree for pretty printing
168
169    Args:
170        node: AST node to format
171        indent: Current indentation level
172
173    Returns:
174        Formatted string representation
175    """
176    if node is None:
177        return " " * indent + "None"
178
179    if isinstance(node, list):
180        if not node:
181            return " " * indent + "[]"
182
183        lines = [" " * indent + "["]
184        for item in node:
185            lines.append(format_ast(item, indent + 2))
186        lines.append(" " * indent + "]")
187        return "\n".join(lines)
188
189    if not isinstance(node, ASTNode):
190        return " " * indent + repr(node)
191
192    # Format AST node
193    node_type = type(node).__name__
194    lines = [" " * indent + f"{node_type}("]
195
196    # Get node attributes (excluding span)
197    attrs = {}
198    for key, value in node.__dict__.items():
199        if key != 'span' and value is not None:
200            attrs[key] = value
201
202    for key, value in attrs.items():
203        if isinstance(value, (list, ASTNode)):
204            lines.append(" " * (indent + 2) + f"{key}=")
205            lines.append(format_ast(value, indent + 4))
206        else:
207            lines.append(" " * (indent + 2) + f"{key}={repr(value)}")
208
209    lines.append(" " * indent + ")")
210    return "\n".join(lines)
211
212
213def format_parser_trace(trace_log: List[str]) -> str:
214    """
215    Format parser trace log for pretty printing
216
217    Args:
218        trace_log: List of trace messages
219
220    Returns:
221        Formatted string representation
222    """
223    lines = []
224    lines.append("=" * 70)
225    lines.append("Parser Trace")
226    lines.append("=" * 70)
227    lines.extend(trace_log)
228    lines.append("=" * 70)
229    return "\n".join(lines)
230
231
232def format_parser_state(state: dict) -> str:
233    """
234    Format parser state dictionary for pretty printing
235
236    Args:
237        state: State dictionary from parser.get_state()
238
239    Returns:
240        Formatted string representation
241    """
242    lines = []
243    lines.append("Parser State:")
244    lines.append(f"  Position:      {state['pos']}")
245    lines.append(f"  Current Token: {state['token_type']}:{repr(state['token_value'])}")
246    lines.append(f"  Depth:         {state['depth']}")
247    lines.append(f"  Active Method: {state['active_method']}")
248
249    if state['stack']:
250        lines.append(f"  Call Stack:")
251        for i, method in enumerate(state['stack']):
252            lines.append(f"    {i}. {method}")
253
254    return "\n".join(lines)
255
256
257@contextmanager
258def parser_debug_context(parser: Parser, enable: bool = True):
259    """
260    Context manager for temporarily enabling parser debug mode
261
262    Args:
263        parser: Parser instance
264        enable: Whether to enable debug mode
265
266    Yields:
267        Parser with debug enabled
268
269    Example:
270        with parser_debug_context(parser):
271            result = parser.parse()
272            parser.print_trace()
273    """
274    old_debug = parser.debug
275    parser.debug = enable
276
277    try:
278        yield parser
279    finally:
280        parser.debug = old_debug
281
282
283def print_tokens(tokens: List[Token], highlight_pos: int = None):
284    """
285    Print token stream to stdout
286
287    Args:
288        tokens: List of tokens
289        highlight_pos: Optional position to highlight
290    """
291    print(format_token_stream(tokens, highlight_pos))
292
293
294def print_ast(node: Any):
295    """
296    Print AST tree to stdout
297
298    Args:
299        node: AST node or list of nodes
300    """
301    print("=" * 70)
302    print("AST")
303    print("=" * 70)
304    print(format_ast(node))
305    print("=" * 70)
306
307
308def print_state(state: dict):
309    """
310    Print parser state to stdout
311
312    Args:
313        state: State dictionary from parser.get_state()
314    """
315    print(format_parser_state(state))
316

One-Function Debugging

debug_parse() is the fastest way to debug parser issues. It performs the complete parse workflow with optional verbose output showing every step.

What It Does

  1. Lexes the SQL into tokens
  2. Prints token stream (if verbose)
  3. Parses with debug tracing enabled
  4. Prints parser trace log (if verbose)
  5. Prints final AST (if verbose)
  6. Returns parsed statements

Usage

```python

Quick parse with no output

statements = debug_parse("SELECT * FROM users")

Full debugging output

statements = debug_parse("SELECT * FROM users", verbose=True) ```

Verbose Output Includes

  • Token stream with indices and types
  • Parser trace showing method calls and token consumption
  • Final AST tree structure

This is perfect for troubleshooting: paste in problematic SQL, set verbose=True, and see exactly what the parser is doing at each step.

349
350def debug_parse(sql: str, verbose: bool = False) -> List[ASTNode]:
351    """
352    Parse SQL with debug tracing enabled
353
354    Args:
355        sql: SQL string to parse
356        verbose: If True, print trace after parsing
357
358    Returns:
359        List of parsed statements
360
361    Example:
362        statements = debug_parse("SELECT * FROM users", verbose=True)
363    """
364    from .lexer import Lexer
365
366    lexer = Lexer(sql)
367    tokens = lexer.tokenize()
368
369    if verbose:
370        print_tokens(tokens)
371        print()
372
373    parser = Parser(tokens, debug=True)
374
375    try:
376        result = parser.parse()
377
378        if verbose:
379            print(format_parser_trace(parser.get_trace_log()))
380            print()
381            print_ast(result)
382
383        return result
384
385    except Exception as e:
386        if verbose:
387            print(format_parser_trace(parser.get_trace_log()))
388            print()
389            print(f"Error: {e}")
390        raise
391
392
393# Export main functions
394__all__ = [
395    'format_token_stream',
396    'format_ast',
397    'format_parser_trace',
398    'format_parser_state',
399    'parser_debug_context',
400    'print_tokens',
401    'print_ast',
402    'print_state',
403    'debug_parse',
404]
405