sqlite_parser/debug.py annotated source

Back to index

Debugging Utilities

This module provides comprehensive debugging tools for inspecting parser internals. These utilities are invaluable when developing new parser features or diagnosing issues.

Debugging Capabilities

Token Stream Visualization: Pretty-print tokens with optional highlighting
AST Formatting: Hierarchical display of AST node trees
Parser Tracing: Log parser method calls and decisions
State Inspection: View parser state at any point
Context Managers: Temporarily enable debugging
High-Level Debug Parse: One-function debugging workflow

Usage Patterns

Quick Debugging

```python from sqlite_parser.debug import debug_parse

Parse with full tracing

statements = debug_parse("SELECT * FROM users", verbose=True) ```

Detailed Token Inspection

```python from sqlite_parser import tokenize_sql from sqlite_parser.debug import print_tokens

tokens = tokenize_sql("SELECT id FROM users") print_tokens(tokens, highlight_pos=2) # Highlight token at position 2 ```

AST Inspection

```python from sqlite_parser import parse_sql from sqlite_parser.debug import print_ast

ast = parse_sql("SELECT * FROM users") print_ast(ast) # Pretty-print entire AST tree ```

Parser State Tracking

```python from sqlite_parser.debug import parser_debug_context

with parser_debug_context(parser): result = parser.parse() state = parser.get_state() print_state(state) ```

57"""

58Debug utilities for SQLite parser

60Provides formatting and inspection tools for debugging the parser,

61including token stream visualization, AST formatting, and parser tracing.

62"""

64from typing import List, Any

65from contextlib import contextmanager

66from .lexer import Token

67from .ast_nodes import ASTNode

68from .parser import Parser

Token Stream Formatter

format_token_stream() creates a tabular display of all tokens with their types and values. This is crucial for understanding how the lexer tokenized the input.

Features

Indexed: Each token shows its position in the stream
Type Display: Token type name (SELECT, IDENTIFIER, NUMBER, etc.)
Value Display: The actual text value
Highlighting: Optional >>> marker for a specific token position

Output Example

```

Token Stream

[ 0] SELECT 'SELECT'

[ 1] STAR '*' [ 2] FROM 'FROM' [ 3] IDENTIFIER 'users' ====================================================================== ```

The highlighting helps visualize parser position during debugging.

97def format_token_stream(tokens: List[Token], highlight_pos: int = None) -> str:

98    """

99    Format token stream for pretty printing

101    Args:

102        tokens: List of tokens to format

103        highlight_pos: Optional position to highlight

105    Returns:

106        Formatted string representation

107    """

108    lines = []

109    lines.append("=" * 70)

110    lines.append("Token Stream")

111    lines.append("=" * 70)

113    for i, token in enumerate(tokens):

114        marker = ">>>" if i == highlight_pos else "   "

115        type_str = f"{token.type.name:15}"

116        value_str = f"{repr(token.value):20}"

117        lines.append(f"{marker} [{i:3d}] {type_str} {value_str}")

119    lines.append("=" * 70)

120    return "\n".join(lines)

AST Tree Formatter

format_ast() recursively formats AST node trees with proper indentation to show the hierarchical structure. This is essential for understanding what the parser built.

Formatting Rules

Nodes: Show type name and attributes (excluding span for clarity)
Lists: Format as [...] with indented items
Primitives: Show with repr() for clarity (strings show quotes)
None: Displayed explicitly
Nesting: Each level indents 2 spaces

Example Output

SelectStatement( select_core= SelectCore( columns= [ ResultColumn( expression= Identifier( name='id' ) ) ] from_clause= FromClause( source= TableReference( name=QualifiedIdentifier( parts=['users'] ) ) ) ) )

This visualization makes it easy to verify the parser built the correct structure.

165def format_ast(node: Any, indent: int = 0) -> str:

166    """

167    Format AST node tree for pretty printing

169    Args:

170        node: AST node to format

171        indent: Current indentation level

173    Returns:

174        Formatted string representation

175    """

176    if node is None:

177        return " " * indent + "None"

179    if isinstance(node, list):

180        if not node:

181            return " " * indent + "[]"

183        lines = [" " * indent + "["]

184        for item in node:

185            lines.append(format_ast(item, indent + 2))

186        lines.append(" " * indent + "]")

187        return "\n".join(lines)

189    if not isinstance(node, ASTNode):

190        return " " * indent + repr(node)

192    # Format AST node

193    node_type = type(node).__name__

194    lines = [" " * indent + f"{node_type}("]

196    # Get node attributes (excluding span)

197    attrs = {}

198    for key, value in node.__dict__.items():

199        if key != 'span' and value is not None:

200            attrs[key] = value

202    for key, value in attrs.items():

203        if isinstance(value, (list, ASTNode)):

204            lines.append(" " * (indent + 2) + f"{key}=")

205            lines.append(format_ast(value, indent + 4))

206        else:

207            lines.append(" " * (indent + 2) + f"{key}={repr(value)}")

209    lines.append(" " * indent + ")")

210    return "\n".join(lines)

213def format_parser_trace(trace_log: List[str]) -> str:

214    """

215    Format parser trace log for pretty printing

217    Args:

218        trace_log: List of trace messages

220    Returns:

221        Formatted string representation

222    """

223    lines = []

224    lines.append("=" * 70)

225    lines.append("Parser Trace")

226    lines.append("=" * 70)

227    lines.extend(trace_log)

228    lines.append("=" * 70)

229    return "\n".join(lines)

232def format_parser_state(state: dict) -> str:

233    """

234    Format parser state dictionary for pretty printing

236    Args:

237        state: State dictionary from parser.get_state()

239    Returns:

240        Formatted string representation

241    """

242    lines = []

243    lines.append("Parser State:")

244    lines.append(f"  Position:      {state['pos']}")

245    lines.append(f"  Current Token: {state['token_type']}:{repr(state['token_value'])}")

246    lines.append(f"  Depth:         {state['depth']}")

247    lines.append(f"  Active Method: {state['active_method']}")

249    if state['stack']:

250        lines.append(f"  Call Stack:")

251        for i, method in enumerate(state['stack']):

252            lines.append(f"    {i}. {method}")

254    return "\n".join(lines)

257@contextmanager

258def parser_debug_context(parser: Parser, enable: bool = True):

259    """

260    Context manager for temporarily enabling parser debug mode

262    Args:

263        parser: Parser instance

264        enable: Whether to enable debug mode

266    Yields:

267        Parser with debug enabled

269    Example:

270        with parser_debug_context(parser):

271            result = parser.parse()

272            parser.print_trace()

273    """

274    old_debug = parser.debug

275    parser.debug = enable

277    try:

278        yield parser

279    finally:

280        parser.debug = old_debug

283def print_tokens(tokens: List[Token], highlight_pos: int = None):

284    """

285    Print token stream to stdout

287    Args:

288        tokens: List of tokens

289        highlight_pos: Optional position to highlight

290    """

291    print(format_token_stream(tokens, highlight_pos))

294def print_ast(node: Any):

295    """

296    Print AST tree to stdout

298    Args:

299        node: AST node or list of nodes

300    """

301    print("=" * 70)

302    print("AST")

303    print("=" * 70)

304    print(format_ast(node))

305    print("=" * 70)

308def print_state(state: dict):

309    """

310    Print parser state to stdout

312    Args:

313        state: State dictionary from parser.get_state()

314    """

315    print(format_parser_state(state))

One-Function Debugging

debug_parse() is the fastest way to debug parser issues. It performs the complete parse workflow with optional verbose output showing every step.

What It Does

Lexes the SQL into tokens
Prints token stream (if verbose)
Parses with debug tracing enabled
Prints parser trace log (if verbose)
Prints final AST (if verbose)
Returns parsed statements

Usage

```python

Quick parse with no output

statements = debug_parse("SELECT * FROM users")

Full debugging output

statements = debug_parse("SELECT * FROM users", verbose=True) ```

Verbose Output Includes

Token stream with indices and types
Parser trace showing method calls and token consumption
Final AST tree structure

This is perfect for troubleshooting: paste in problematic SQL, set verbose=True, and see exactly what the parser is doing at each step.

350def debug_parse(sql: str, verbose: bool = False) -> List[ASTNode]:

351    """

352    Parse SQL with debug tracing enabled

354    Args:

355        sql: SQL string to parse

356        verbose: If True, print trace after parsing

358    Returns:

359        List of parsed statements

361    Example:

362        statements = debug_parse("SELECT * FROM users", verbose=True)

363    """

364    from .lexer import Lexer

366    lexer = Lexer(sql)

367    tokens = lexer.tokenize()

369    if verbose:

370        print_tokens(tokens)

371        print()

373    parser = Parser(tokens, debug=True)

375    try:

376        result = parser.parse()

378        if verbose:

379            print(format_parser_trace(parser.get_trace_log()))

380            print()

381            print_ast(result)

383        return result

385    except Exception as e:

386        if verbose:

387            print(format_parser_trace(parser.get_trace_log()))

388            print()

389            print(f"Error: {e}")

390        raise

393# Export main functions

394__all__ = [

395    'format_token_stream',

396    'format_ast',

397    'format_parser_trace',

398    'format_parser_state',

399    'parser_debug_context',

400    'print_tokens',

401    'print_ast',

402    'print_state',

403    'debug_parse',

404]