sqlite_parser/utils.py annotated source
Back to indexUtility Functions and Constants
This module serves as the foundation of the SQLite parser, providing all the essential constants, enumerations, and lookup tables needed for tokenization and parsing.
Key Components
- TokenType Enum: Defines all 200+ token types including SQLite's 147 keywords, operators, delimiters, and literals
- Keyword Mapping: Fast O(1) lookup from string to token type
- Precedence Table: Defines operator precedence for expression parsing
- Operator Mappings: Convert tokens to AST operator nodes
- Helper Functions: Common operations like keyword checking and precedence lookup
The design follows separation of concerns: this module contains only static data and pure functions, with no parsing or lexing logic.
1718"""19Utility Functions and Constants for SQLite SQL Parser2021Contains keyword lists, precedence tables, and helper functions.22"""2324from enum import Enum, auto25from typing import Dict, Set26from .ast_nodes import BinaryOperator, UnaryOperator27Token Type Enumeration
The TokenType enum is the parser's vocabulary. It defines every distinct type of token
that can appear in SQLite SQL, organized into logical categories:
- Keywords (lines 15-168): All 147 SQLite reserved words plus TRUE/FALSE
- Operators (lines 170-191): Arithmetic, comparison, bitwise, and special operators
- Delimiters (lines 193-200): Parentheses, brackets, commas, and punctuation
- Literals (lines 202-207): Numbers, strings, blobs, identifiers, and parameters
- Special (lines 209-211): EOF and NEWLINE for parser control flow
Using Python's auto() generates sequential integer values automatically, making it easy
to add new token types without manual numbering.
4142class TokenType(Enum):43 """Token types for SQLite SQL"""44 # Keywords (147 SQLite keywords)45 ABORT = auto()46 ACTION = auto()47 ADD = auto()48 AFTER = auto()49 ALL = auto()50 ALTER = auto()51 ALWAYS = auto()52 ANALYZE = auto()53 AND = auto()54 AS = auto()55 ASC = auto()56 ATTACH = auto()57 AUTOINCREMENT = auto()58 BEFORE = auto()59 BEGIN = auto()60 BETWEEN = auto()61 BY = auto()62 CASCADE = auto()63 CASE = auto()64 CAST = auto()65 CHECK = auto()66 COLLATE = auto()67 COLUMN = auto()68 COMMIT = auto()69 CONFLICT = auto()70 CONSTRAINT = auto()71 CREATE = auto()72 CROSS = auto()73 CURRENT = auto()74 CURRENT_DATE = auto()75 CURRENT_TIME = auto()76 CURRENT_TIMESTAMP = auto()77 DATABASE = auto()78 DEFAULT = auto()79 DEFERRABLE = auto()80 DEFERRED = auto()81 DELETE = auto()82 DESC = auto()83 DETACH = auto()84 DISTINCT = auto()85 DO = auto()86 DROP = auto()87 EACH = auto()88 ELSE = auto()89 END = auto()90 ESCAPE = auto()91 EXCEPT = auto()92 EXCLUDE = auto()93 EXCLUSIVE = auto()94 EXISTS = auto()95 EXPLAIN = auto()96 FAIL = auto()97 FILTER = auto()98 FIRST = auto()99 FOLLOWING = auto()100 FOR = auto()101 FOREIGN = auto()102 FROM = auto()103 FULL = auto()104 GENERATED = auto()105 GLOB = auto()106 GROUP = auto()107 GROUPS = auto()108 HAVING = auto()109 IF = auto()110 IGNORE = auto()111 IMMEDIATE = auto()112 IN = auto()113 INDEX = auto()114 INDEXED = auto()115 INITIALLY = auto()116 INNER = auto()117 INSERT = auto()118 INSTEAD = auto()119 INTERSECT = auto()120 INTO = auto()121 IS = auto()122 ISNULL = auto()123 JOIN = auto()124 KEY = auto()125 LAST = auto()126 LEFT = auto()127 LIKE = auto()128 LIMIT = auto()129 MATCH = auto()130 MATERIALIZED = auto()131 NATURAL = auto()132 NO = auto()133 NOT = auto()134 NOTHING = auto()135 NOTNULL = auto()136 NULL = auto()137 NULLS = auto()138 OF = auto()139 OFFSET = auto()140 ON = auto()141 OR = auto()142 ORDER = auto()143 OTHERS = auto()144 OUTER = auto()145 OVER = auto()146 PARTITION = auto()147 PLAN = auto()148 PRAGMA = auto()149 PRECEDING = auto()150 PRIMARY = auto()151 QUERY = auto()152 RAISE = auto()153 RANGE = auto()154 RECURSIVE = auto()155 REFERENCES = auto()156 REGEXP = auto()157 REINDEX = auto()158 RELEASE = auto()159 RENAME = auto()160 REPLACE = auto()161 RESTRICT = auto()162 RETURNING = auto()163 RIGHT = auto()164 ROLLBACK = auto()165 ROW = auto()166 ROWID = auto()167 ROWS = auto()168 SAVEPOINT = auto()169 SELECT = auto()170 SET = auto()171 STRICT = auto()172 STORED = auto()173 TABLE = auto()174 TEMP = auto()175 TEMPORARY = auto()176 THEN = auto()177 TIES = auto()178 TO = auto()179 TRANSACTION = auto()180 TRIGGER = auto()181 UNBOUNDED = auto()182 UNION = auto()183 UNIQUE = auto()184 UPDATE = auto()185 USING = auto()186 VACUUM = auto()187 VALUES = auto()188 VIEW = auto()189 VIRTUAL = auto()190 WHEN = auto()191 WHERE = auto()192 WINDOW = auto()193 WITH = auto()194 WITHOUT = auto()195196 # Special keywords197 TRUE = auto()198 FALSE = auto()199200 # Operators201 PLUS = auto() # +202 MINUS = auto() # -203 STAR = auto() # *204 SLASH = auto() # /205 PERCENT = auto() # %206 CONCAT = auto() # ||207 EQUAL = auto() # =208 EQUAL2 = auto() # ==209 NOT_EQUAL = auto() # !=210 NOT_EQUAL2 = auto() # <>211 LESS = auto() # <212 GREATER = auto() # >213 LESS_EQUAL = auto() # <=214 GREATER_EQUAL = auto() # >=215 LEFT_SHIFT = auto() # <<216 RIGHT_SHIFT = auto() # >>217 AMPERSAND = auto() # &218 PIPE = auto() # |219 TILDE = auto() # ~220 ARROW = auto() # ->221 DOUBLE_ARROW = auto() # ->>222223 # Delimiters224 LPAREN = auto() # (225 RPAREN = auto() # )226 LBRACKET = auto() # [227 RBRACKET = auto() # ]228 COMMA = auto() # ,229 SEMICOLON = auto() # ;230 DOT = auto() # .231232 # Literals233 NUMBER = auto() # 123, 123.45, 1.5e10234 STRING = auto() # 'string'235 BLOB = auto() # X'hex'236 IDENTIFIER = auto() # name, [name], "name", `name`237 PARAMETER = auto() # ?, ?123, :name, @name, $name238239 # Special240 EOF = auto()241 NEWLINE = auto()242Keyword Lookup Dictionary
The KEYWORDS dictionary provides O(1) lookup from uppercase string to TokenType.
This is used during lexing to distinguish keywords from identifiers.
Design Decisions
- Case insensitivity: All keys are uppercase because SQLite keywords are case-insensitive. The lexer normalizes input before lookup.
- Complete coverage: Includes all 147 official SQLite keywords from the SQLite documentation
- Easy maintenance: The 1:1 mapping from string to enum makes it trivial to add new keywords
When the lexer encounters an identifier, it checks KEYWORDS to determine if it's actually
a reserved word.
258259# Map keyword strings to token types260KEYWORDS: Dict[str, TokenType] = {261 'ABORT': TokenType.ABORT,262 'ACTION': TokenType.ACTION,263 'ADD': TokenType.ADD,264 'AFTER': TokenType.AFTER,265 'ALL': TokenType.ALL,266 'ALTER': TokenType.ALTER,267 'ALWAYS': TokenType.ALWAYS,268 'ANALYZE': TokenType.ANALYZE,269 'AND': TokenType.AND,270 'AS': TokenType.AS,271 'ASC': TokenType.ASC,272 'ATTACH': TokenType.ATTACH,273 'AUTOINCREMENT': TokenType.AUTOINCREMENT,274 'BEFORE': TokenType.BEFORE,275 'BEGIN': TokenType.BEGIN,276 'BETWEEN': TokenType.BETWEEN,277 'BY': TokenType.BY,278 'CASCADE': TokenType.CASCADE,279 'CASE': TokenType.CASE,280 'CAST': TokenType.CAST,281 'CHECK': TokenType.CHECK,282 'COLLATE': TokenType.COLLATE,283 'COLUMN': TokenType.COLUMN,284 'COMMIT': TokenType.COMMIT,285 'CONFLICT': TokenType.CONFLICT,286 'CONSTRAINT': TokenType.CONSTRAINT,287 'CREATE': TokenType.CREATE,288 'CROSS': TokenType.CROSS,289 'CURRENT': TokenType.CURRENT,290 'CURRENT_DATE': TokenType.CURRENT_DATE,291 'CURRENT_TIME': TokenType.CURRENT_TIME,292 'CURRENT_TIMESTAMP': TokenType.CURRENT_TIMESTAMP,293 'DATABASE': TokenType.DATABASE,294 'DEFAULT': TokenType.DEFAULT,295 'DEFERRABLE': TokenType.DEFERRABLE,296 'DEFERRED': TokenType.DEFERRED,297 'DELETE': TokenType.DELETE,298 'DESC': TokenType.DESC,299 'DETACH': TokenType.DETACH,300 'DISTINCT': TokenType.DISTINCT,301 'DO': TokenType.DO,302 'DROP': TokenType.DROP,303 'EACH': TokenType.EACH,304 'ELSE': TokenType.ELSE,305 'END': TokenType.END,306 'ESCAPE': TokenType.ESCAPE,307 'EXCEPT': TokenType.EXCEPT,308 'EXCLUDE': TokenType.EXCLUDE,309 'EXCLUSIVE': TokenType.EXCLUSIVE,310 'EXISTS': TokenType.EXISTS,311 'EXPLAIN': TokenType.EXPLAIN,312 'FAIL': TokenType.FAIL,313 'FILTER': TokenType.FILTER,314 'FIRST': TokenType.FIRST,315 'FOLLOWING': TokenType.FOLLOWING,316 'FOR': TokenType.FOR,317 'FOREIGN': TokenType.FOREIGN,318 'FROM': TokenType.FROM,319 'FULL': TokenType.FULL,320 'GENERATED': TokenType.GENERATED,321 'GLOB': TokenType.GLOB,322 'GROUP': TokenType.GROUP,323 'GROUPS': TokenType.GROUPS,324 'HAVING': TokenType.HAVING,325 'IF': TokenType.IF,326 'IGNORE': TokenType.IGNORE,327 'IMMEDIATE': TokenType.IMMEDIATE,328 'IN': TokenType.IN,329 'INDEX': TokenType.INDEX,330 'INDEXED': TokenType.INDEXED,331 'INITIALLY': TokenType.INITIALLY,332 'INNER': TokenType.INNER,333 'INSERT': TokenType.INSERT,334 'INSTEAD': TokenType.INSTEAD,335 'INTERSECT': TokenType.INTERSECT,336 'INTO': TokenType.INTO,337 'IS': TokenType.IS,338 'ISNULL': TokenType.ISNULL,339 'JOIN': TokenType.JOIN,340 'KEY': TokenType.KEY,341 'LAST': TokenType.LAST,342 'LEFT': TokenType.LEFT,343 'LIKE': TokenType.LIKE,344 'LIMIT': TokenType.LIMIT,345 'MATCH': TokenType.MATCH,346 'MATERIALIZED': TokenType.MATERIALIZED,347 'NATURAL': TokenType.NATURAL,348 'NO': TokenType.NO,349 'NOT': TokenType.NOT,350 'NOTHING': TokenType.NOTHING,351 'NOTNULL': TokenType.NOTNULL,352 'NULL': TokenType.NULL,353 'NULLS': TokenType.NULLS,354 'OF': TokenType.OF,355 'OFFSET': TokenType.OFFSET,356 'ON': TokenType.ON,357 'OR': TokenType.OR,358 'ORDER': TokenType.ORDER,359 'OTHERS': TokenType.OTHERS,360 'OUTER': TokenType.OUTER,361 'OVER': TokenType.OVER,362 'PARTITION': TokenType.PARTITION,363 'PLAN': TokenType.PLAN,364 'PRAGMA': TokenType.PRAGMA,365 'PRECEDING': TokenType.PRECEDING,366 'PRIMARY': TokenType.PRIMARY,367 'QUERY': TokenType.QUERY,368 'RAISE': TokenType.RAISE,369 'RANGE': TokenType.RANGE,370 'RECURSIVE': TokenType.RECURSIVE,371 'REFERENCES': TokenType.REFERENCES,372 'REGEXP': TokenType.REGEXP,373 'REINDEX': TokenType.REINDEX,374 'RELEASE': TokenType.RELEASE,375 'RENAME': TokenType.RENAME,376 'REPLACE': TokenType.REPLACE,377 'RESTRICT': TokenType.RESTRICT,378 'RETURNING': TokenType.RETURNING,379 'RIGHT': TokenType.RIGHT,380 'ROLLBACK': TokenType.ROLLBACK,381 'ROW': TokenType.ROW,382 'ROWID': TokenType.ROWID,383 'ROWS': TokenType.ROWS,384 'SAVEPOINT': TokenType.SAVEPOINT,385 'SELECT': TokenType.SELECT,386 'SET': TokenType.SET,387 'STORED': TokenType.STORED,388 'STRICT': TokenType.STRICT,389 'TABLE': TokenType.TABLE,390 'TEMP': TokenType.TEMP,391 'TEMPORARY': TokenType.TEMPORARY,392 'THEN': TokenType.THEN,393 'TIES': TokenType.TIES,394 'TO': TokenType.TO,395 'TRANSACTION': TokenType.TRANSACTION,396 'TRIGGER': TokenType.TRIGGER,397 'UNBOUNDED': TokenType.UNBOUNDED,398 'UNION': TokenType.UNION,399 'UNIQUE': TokenType.UNIQUE,400 'UPDATE': TokenType.UPDATE,401 'USING': TokenType.USING,402 'VACUUM': TokenType.VACUUM,403 'VALUES': TokenType.VALUES,404 'VIEW': TokenType.VIEW,405 'VIRTUAL': TokenType.VIRTUAL,406 'WHEN': TokenType.WHEN,407 'WHERE': TokenType.WHERE,408 'WINDOW': TokenType.WINDOW,409 'WITH': TokenType.WITH,410 'WITHOUT': TokenType.WITHOUT,411 'TRUE': TokenType.TRUE,412 'FALSE': TokenType.FALSE,413}414Operator Precedence Table
The PRECEDENCE dictionary defines the parsing priority for binary operators.
Lower numbers bind less tightly (evaluate later), following the precedence climbing
algorithm used in expression parsing.
Precedence Levels (from lowest to highest)
- OR (1): Logical OR - lowest precedence, binds loosest
- AND (2): Logical AND
- NOT (3): Logical negation (unary, but listed here for reference)
- Comparison (4):
=,<,>,IS,IN,LIKE,BETWEEN, etc. - Bitwise (5): Bit shifts and bitwise AND/OR
- Additive (6):
+,-,||(string concatenation) - Multiplicative (7):
*,/,%- highest precedence, binds tightest
Example
In 1 + 2 * 3 OR 4 < 5 AND 6:
- 2 * 3 evaluates first (precedence 7)
- 1 + 6 follows (precedence 6)
- 4 < 5 comparison (precedence 4)
- AND (precedence 2)
- OR (precedence 1) last
Note: Special operators like BETWEEN, IN, and LIKE have their own parsing methods,
but are listed here at precedence 4 for when they appear as binary operators.
442443# Operator precedence (lower number = lower precedence)444PRECEDENCE: Dict[TokenType, int] = {445 TokenType.OR: 1,446 TokenType.AND: 2,447 TokenType.NOT: 3,448 # Comparison operators449 TokenType.EQUAL: 4,450 TokenType.EQUAL2: 4,451 TokenType.NOT_EQUAL: 4,452 TokenType.NOT_EQUAL2: 4,453 TokenType.LESS: 4,454 TokenType.GREATER: 4,455 TokenType.LESS_EQUAL: 4,456 TokenType.GREATER_EQUAL: 4,457 TokenType.IS: 4,458 TokenType.IN: 4,459 TokenType.LIKE: 4,460 TokenType.GLOB: 4,461 TokenType.MATCH: 4,462 TokenType.REGEXP: 4,463 TokenType.BETWEEN: 4,464 # Bitwise465 TokenType.LEFT_SHIFT: 5,466 TokenType.RIGHT_SHIFT: 5,467 TokenType.AMPERSAND: 5,468 TokenType.PIPE: 5,469 # Arithmetic and string470 TokenType.PLUS: 6,471 TokenType.MINUS: 6,472 TokenType.CONCAT: 6,473 TokenType.STAR: 7,474 TokenType.SLASH: 7,475 TokenType.PERCENT: 7,476 # COLLATE handled specially477 # Unary handled specially478}479Associativity
SQL operators are left-associative by default, meaning a - b - c parses as (a - b) - c.
This empty set is a placeholder for any right-associative operators that might be added in
the future (though SQL has very few right-associative operators).
485486# Right-associative operators487RIGHT_ASSOCIATIVE: Set[TokenType] = set() # Most operators in SQL are left-associative488Helper Functions
These utility functions provide clean abstractions over the lookup tables, making the lexer and parser code more readable and maintainable.
Each function has a single, clear responsibility: - Keyword checking and normalization - Operator classification and precedence lookup - Token-to-AST-node conversion
498499def is_keyword(word: str) -> bool:500 """Check if a word is a SQLite keyword"""501 return word.upper() in KEYWORDS502503504def normalize_keyword(word: str) -> str:505 """Normalize keyword to uppercase"""506 return word.upper()507508509def get_keyword_token_type(word: str) -> TokenType:510 """Get token type for keyword"""511 return KEYWORDS.get(word.upper(), TokenType.IDENTIFIER)512513514def is_binary_operator(token_type: TokenType) -> bool:515 """Check if token type is a binary operator"""516 return token_type in PRECEDENCE517518519def get_precedence(token_type: TokenType) -> int:520 """Get precedence for operator"""521 return PRECEDENCE.get(token_type, 0)522523524def is_right_associative(token_type: TokenType) -> bool:525 """Check if operator is right-associative"""526 return token_type in RIGHT_ASSOCIATIVE527Token-to-AST Operator Mappings
These dictionaries convert TokenTypes to AST operator enums, bridging the gap between the lexer (which produces tokens) and the parser (which builds AST nodes).
Binary Operators
Maps operator tokens to BinaryOperator enum values. For example, when the parser
sees a TokenType.PLUS token, it converts it to BinaryOperator.ADD for the AST node.
This separation allows the same token type to map to different operators in different contexts (though SQL doesn't often need this flexibility).
540541# Token type to binary operator mapping542TOKEN_TO_BINARY_OP: Dict[TokenType, BinaryOperator] = {543 TokenType.PLUS: BinaryOperator.ADD,544 TokenType.MINUS: BinaryOperator.SUBTRACT,545 TokenType.STAR: BinaryOperator.MULTIPLY,546 TokenType.SLASH: BinaryOperator.DIVIDE,547 TokenType.PERCENT: BinaryOperator.MODULO,548 TokenType.EQUAL: BinaryOperator.EQUAL,549 TokenType.EQUAL2: BinaryOperator.EQUAL2,550 TokenType.NOT_EQUAL: BinaryOperator.NOT_EQUAL,551 TokenType.NOT_EQUAL2: BinaryOperator.NOT_EQUAL2,552 TokenType.LESS: BinaryOperator.LESS_THAN,553 TokenType.GREATER: BinaryOperator.GREATER_THAN,554 TokenType.LESS_EQUAL: BinaryOperator.LESS_EQUAL,555 TokenType.GREATER_EQUAL: BinaryOperator.GREATER_EQUAL,556 TokenType.AND: BinaryOperator.AND,557 TokenType.OR: BinaryOperator.OR,558 TokenType.AMPERSAND: BinaryOperator.BIT_AND,559 TokenType.PIPE: BinaryOperator.BIT_OR,560 TokenType.LEFT_SHIFT: BinaryOperator.LEFT_SHIFT,561 TokenType.RIGHT_SHIFT: BinaryOperator.RIGHT_SHIFT,562 TokenType.CONCAT: BinaryOperator.CONCAT,563 TokenType.IS: BinaryOperator.IS,564 TokenType.IN: BinaryOperator.IN,565 TokenType.LIKE: BinaryOperator.LIKE,566 TokenType.GLOB: BinaryOperator.GLOB,567 TokenType.MATCH: BinaryOperator.MATCH,568 TokenType.REGEXP: BinaryOperator.REGEXP,569}570Unary Operators
Maps unary operator tokens to UnaryOperator enum values. Unary operators appear before
their operand (prefix position): -5, NOT condition, ~bitmask.
Note that + and - can be both unary and binary operators. The parser determines which
based on context (whether there's a left operand).
578579# Token type to unary operator mapping580TOKEN_TO_UNARY_OP: Dict[TokenType, UnaryOperator] = {581 TokenType.PLUS: UnaryOperator.PLUS,582 TokenType.MINUS: UnaryOperator.MINUS,583 TokenType.NOT: UnaryOperator.NOT,584 TokenType.TILDE: UnaryOperator.BITNOT,585}586