42208

ParseKit - SQLite parser going into infinite recursion

For my application, I am trying to build an SQLite parser. As my application is using Objective-C, ParseKit seems like a good option. I read SQLite's syntax diagrams and built a grammar based on them. However, when I try to parse something using this grammar, the parser goes into infinite recursion.

The only statements I need are SELECT, INSERT, UPDATE, and DELETE (I need SELECT mostly because the others refer to it). My @start is designed to handle multiple statements separated by semicolons:

@start = statement (';' statement)*; statement = select_stmt | insert_stmt | update_stmt | delete_stmt ;

The statements are as follows:

select_stmt = select_core ( compound_operator select_core )* order_expr? limit_expr? ; select_core = 'select' ( 'distinct' | 'all' )? result_column ( ',' result_column )* from_expr where_expr group_expr ; result_column = '*' | table_name '.' '*' | expr ( 'as'? column_alias )? ; insert_stmt = 'insert' or_on_failure? 'into' table_name insert_expr ; insert_expr = insert_expr_cols? (insert_expr_values | select_stmt) ; insert_expr_cols = '(' column_name ( ',' column_name )* ')' ; insert_expr_values = 'values' '(' expr ( ',' expr )* ')' ; insert_expr_defaults = 'default' 'values' ; update_stmt = 'update' or_on_failure? qualified_table_name update_expr where_expr? limited_expr? ; update_expr = 'set' update_expr_col ( ',' update_expr_col )* ; update_expr_col = column_name '=' expr ; delete_stmt = 'delete' 'from' qualified_table_name where_expr? limited_expr? ;

And their supporting expressions:

order_expr = 'order' 'by' ordering_term (',' ordering_term)* ; limit_expr = 'limit' expr ( ( 'offset' | ',' ) expr )? ; from_expr = 'from' join_source ; where_expr = 'where' expr ; group_expr = 'group' 'by' expr ( ',' expr )? ( 'having' expr )? ; join_source = single_source ( join_operator single_source join_constraint? )* ; single_source = table_name 'as' table_alias indexed_by? | '(' select_stmt ')' ( 'as'? table_alias )? | '(' join_source ')' ; or_on_failure = 'or' on_failure ; on_failure = 'rollback' | 'abort' | 'replace' | 'fail' | 'ignore' ; limited_expr = order_expr limit_expr ;

Names, aliases:

database_name = name ; table_name = (database_name '.')? name ; column_name = (table_name '.')? name ; table_alias = name ; column_alias = name ; index_name = name ; type_name = name+ ( '(' number (',' number)? ')' )? ; function_name = name ; collation_name = name ; qualified_table_name = table_name indexed_by? ;

Miscellaneous operators, etc:

indexed_by = 'indexed' 'by' index_name | 'not' 'indexed' ; unary_operator = symbol ; binary_operator = symbol ; compound_operator = 'union' 'all'? | 'intersect' | 'except' ; join_operator = ',' | 'natural'? ( 'left' 'outer'? | 'inner' | 'cross' ) 'join' ; join_constraint = 'on' expr | 'using' '(' column_name ( ',' column_name )* ')' ;

Basic types:

literal = number | string | 'null' | 'current_time' | 'current_data' | 'current_timestamp' ; number = Number ; string = Word | QuotedString ; name = Word | QuotedString ; symbol = Symbol;

And the EXPR:

expr = literal | column_name | unary_operator expr | expr binary_operator expr | function_name '(' ( '*' | 'distinct'? expr ( ',' expr )* )? ')' | '(' expr ')' | 'cast' '(' expr 'as' type_name ')' | expr 'collate' collation_name | expr 'not'? ( 'like' | 'glob' | 'regexp' | 'match' ) expr ( 'escape' expr )? | expr ( 'isnull' | 'notnull' | 'not' 'null' ) | expr 'is' 'not'? expr | expr 'not'? 'between' expr 'and' expr | expr 'not'? 'in' ( table_name | '(' ( select_stmt | expr ( ',' expr )* )? ')' ) | ( 'not'? 'exists' )? '(' select_stmt ')' | 'case' expr? ( 'when' expr 'then' expr )+ ( 'else' expr )? 'end' ;

When I stepped through the code, the basic path was @start -> statement -> select_stmt -> select_core -> result_column -> expr -> expr -> expr...

After about 8-9k calls between PKParser's matchAndAssemble: and PKParser/Subclass's allMatchesFor:, something dies, usually from a EXC_BAD_ACCESS error (and then LLDB can't do anything either).

P.S.: If you're going to post an answer saying, 'Oh, you should really do/use this,' A) I like Objective-C. Don't tell me not to use it. It's my choice. My response will likely be a rant. And B) I tried digging through SQLite's source to use their parser. I never got anywhere. If you think I should use that, kindly post the source for their parser as a single file with no other dependencies.

Answer1:

Developer of ParseKit here.

First, see my previous answers on debugging ParseKit grammars and battling infinite recursion in ParseKit grammars.

<hr>

I think there might be an issue in the very first line (but I'm not a SQL expert, so I'm not sure). Shouldn't that be:

@start = (statement ';')+; <hr>

I would strongly recommend using camel case instead of underscores, as underscores will make your Objective-C callbacks very awkward and ugly. That is why camel case is the convention in ParseKit grammars.

<hr>

However, I see the main issue. Your grammar includes <strong>Left Recursion</strong> which is not allowed in ParseKit grammars. Particularly in your expr prodcution (I haven't looked closely to see if it is elsewhere too).

ParseKit is fine with recursion, but <strong>not left recusion</strong>. The best explanation of this is in Steven Metsker's "Building Parsers with Java". Or search the web.

But basically left recursion is when a production immediately references itself (on the left side of an expression):

e = e '-' Number;

or

e = Number | e '-' Number;

Instead, you must design your grammars to remove left recursion like:

e = Number ('-' Number)*;

Recommend

  • Oracle More than one materialized view on a materialized log
  • Java Template library similar to ZPT (attribute language)
  • tcpdf encode chinese character
  • C++ calling the default constructor with parens vs without parens [duplicate]
  • How can i automate 'Settings' app in real iOS devices?
  • Get XML response value with GDataXML
  • JPA CDI Injecting DAO into an Entity
  • Sum data table columns using linq
  • Rails AREL .where statement
  • Disable/remove close icon on Kendo Grid's default group column
  • SQL Server re-calculate or not?
  • How to name a 'group' check box in Adobe Reader when wanting to fill form by FDF / XFDF
  • Is there any way to use wpdb prepare statements for array implode(' OR ', $myArray)?
  • SF2 Functional tests : “Resetting the container is not allowed when a scope is active”
  • SQL - Select lowest values with group by and order by?
  • Flex items with same property values are rendering in different sizes
  • Use sed with regex and (
  • How to load more than one div at a time
  • Doctrine2 bulk import try to work with another entity
  • Linq Merge lists
  • cygwin cannot exec 'git-add--interactive' permission denied
  • Query to find the duplicates between the name and number in table
  • Grails calculated field in SQL
  • Custom Tabgroup Appcelerator
  • Record samples being played with OpenAL
  • Breeze - Deleted Items nav properties bug
  • Django: Count of Group Elements
  • Linq Objects Group By & Sum
  • Adding a button at the bottom of a table view
  • Getting last autonumber in access
  • Read text file and split every line in MSBuild
  • javaw.exe and eclipse startup problems
  • How to add a column to a Pandas dataframe made of arrays of the n-preceding values of another column
  • Delete MySQLi record without showing the id in the URL
  • GridView Sorting works once only
  • Unit Testing MVC Web Application in Visual Studio and Problem with QTAgent
  • Rails 2: use form_for to build a form covering multiple objects of the same class
  • python regex in pyparsing
  • embed rChart in Markdown
  • need help with bizarre java.net.HttpURLConnection behavior