84191

Flex and terminating state machine for reading strings

Question:

My flex file is given below. Beyond trivial symbols, it defines a state machine to read strings. So it starts whenever it encounters an " and terminates on locating a following ". Now when I feed this flex file an input with two strings followed by each other like this:

"this" "apple"

It correctly identifies this but fails to find apple. Why is this current behavior happening? I have put in BEGIN(INITIAL) identifier but it does not work.

/* sample simple scanner */ %{ int num_lines = 0; #define CLASS 10 #define LAMBDA 1 #define DOT 2 #define PLUS 3 #define OPEN 4 #define CLOSE 5 #define NUM 6 #define ID 7 #define INVALID 8 #define MAX_STR_CONST 256; #define COMMENT 11; char string_buf[256]; char *string_buf_ptr; char string_buf_cmnt[256]; char *string_buf_ptr_cmnt; int size = 0; %} %x str %x comment1 %x comment2 %% \" { string_buf_ptr = (char*)malloc(8); size = 0; BEGIN(str);} <str>\" { /* saw closing quote - all done */ /* return string constant token type and * value to parser */ *string_buf_ptr = '\0'; /* apppend the end of string with null */ string_buf_ptr = string_buf_ptr - size; /* scale back string ptr to start */ int i = 0; for (; i < size; i++){ yytext[i]=*(string_buf_ptr + i); /* copy each character to yytext */ } yytext[i]='\0'; /* apppend the end of string with null */ free(string_buf_ptr); BEGIN(INITIAL); /* go back to start */ return ID; } <str>\n { /* error - unterminated string constant */ /* generate error message */ //printf("error is here\n"); } <str>\\0 ; <str>\\[0-7]{1,3} { /* octal escape sequence */ int result; (void) sscanf( yytext + 1, "%o", &result ); if (result == 0x00){ *string_buf_ptr++ = '0'; } else { if ( result > 0xff ){ /* error, constant is out-of-bounds */} else{*string_buf_ptr++ = result;} } size++; } <str>\\[0-9]+ { /* generate error - bad escape sequence; something * like '\48' or '\0777777' */ } <str>\\n *string_buf_ptr++ = '\n'; size++; <str>\\t *string_buf_ptr++ = '\t'; size++; <str>\\r *string_buf_ptr++ = '\r'; size++; <str>\\b *string_buf_ptr++ = '\b'; size++; <str>\\f *string_buf_ptr++ = '\f'; size++; <str>\\a *string_buf_ptr++ = '\a'; size++; <str>\\(.|\n) *string_buf_ptr++ = yytext[1]; size++; <str>[^\\\n\"]+ { //printf("there\n"); char *yptr = yytext; int i = 0; while ( *yptr ) { *string_buf_ptr++ = *yptr++; yytext[i] = *(string_buf_ptr-1); size++; i++; } } [ ]+ //printf("space\n"); %% main(int argc, char **argv) { int res; yyin = stdin; while(res = yylex()) { printf("class: %d lexeme: %s line: %d\n", res, yytext, num_lines); } }

Answer1:

You can't overwrite yytext like that. yytext is not guaranteed to point at usable memory beyond the current token, and anyway you're not allowed to modify yytext outside of the current token.

So what's happening is that you end up copying this over top of the pending input, which overwrites the " which starts the second string. So it's not going to be recognized as a string.

Instead of overwriting yytext, just make your string_buf_ptr visible to the caller of yylex by either making it a global variable or passing a pointer to a return value as an extra argument to the lexer (see the YY_DECL macro). Of course, that will force you to change your memory management strategy, but your current memory management won't work either since some tokens are likely to be more than seven characters long.

Personally, I'd avoid the global, and keep a static char* which can be passed back to the caller via an out parameter. Then you can require that the caller make a copy of the string if they need to keep it beyond the next call to yylex. You could insist that the caller free the string, but the advantage of the "caller copies" strategy is that no copy will be made if the caller doesn't need to persist the string. This is precisely the strategy used with yytext; yytext will be destroyed by the next call to yylex so a caller needing to persist the token's value needs to make a copy of yytext.

Recommend

  • What match=“/” actually returns?
  • Modifying HTML strings in C#?
  • Salted sha512 in C, cannot synchronise with Symfony2's FOSUserBundle
  • Getting segmentation fault while using malloc
  • Parse a date string in a specific locale (not timezone!)
  • Groovy: Unexpected token “:”
  • Android - Material Design - NavigationView - How to put vertical scroll?
  • How to write order and limit within cakephp joins array
  • Custom validator control occupying space even though display set to dynamic
  • Word Open XML Mail Merge
  • why overloaded new operator is calling constructor even I am using malloc inside overloading functio
  • Android fill_parent issue
  • Change multiple background-images with jQuery
  • FileReader+canvas image loading problem
  • Pass value from viewmodel to script in zk
  • Using $this when not in object context
  • Sails.js/waterline: Executing waterline queries in toJSON function of a model?
  • Android screen density dpi vs ppi
  • Deselecting radio buttons while keeping the View Model in synch
  • Getting last autonumber in access
  • Counter field in MS Access, how to generate?
  • Incrementing object id automatically JS constructor (static method and variable)
  • DirectX11 ClearRenderTargetViewback with transparent buffer?
  • How to check if every primary key value is being referenced as foreign key in another table
  • How to handle AllServersUnavailable Exception
  • Can I have the cursor start on a particular column by default in jqgrid's edit mode?
  • Change an a tag attribute in JavaScript based on screen width
  • jquery mobile loadPage not working
  • 0x202A in filename: Why?
  • How to delete a row from a dynamic generate table using jquery?
  • json Serialization in asp
  • Rails 2: use form_for to build a form covering multiple objects of the same class
  • How to stop GridView from loading again when I press back button?
  • costura.fody for a dll that references another dll
  • Observable and ngFor in Angular 2
  • How to Embed XSL into XML
  • UserPrincipal.Current returns apppool on IIS
  • Conditional In-Line CSS for IE and Others?
  • Python/Django TangoWithDjango Models and Databases
  • java string with new operator and a literal