@q file: o2_defs.w@> @q% Copyright Dave Bone 1998 - 2015@> @q% /*@> @q% This Source Code Form is subject to the terms of the Mozilla Public@> @q% License, v. 2.0. If a copy of the MPL was not distributed with this@> @q% file, You can obtain one at http://mozilla.org/MPL/2.0/.@> @q% */@> @q% See o2_types.w document for defiitions@> @*1 {\bf{LR1 definitions}}.\fbreak \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/o2_structures.1"}{1}{1} \fbreak \fbreak \convertMPtoPDF{"/usr/local/yacco2/diagrams/o2_structures.2"}{1}{1} \fbreak \fbreak Let's review what makes up a state:\fbreak \ptindent{1) subrules's specific element --- state's to gen vectors} \ptindent{2) rules's follow set} \ptindent{3) state's entry symbol --- Start state has no entry symbol} \ptindent{4) state's list of conflict states} A state is a set of productions (subrules) where each production's current symbol being worked on is some position along its string. A state from the arithmetic grammar discussed earlier could be represented by the following example where the ``.'' indicates the position within the production's string being worked on in the state:\fbreak \ptindent{S $\rightarrow$ E \ . \ $\bot$} \ptindent{E $\rightarrow$ E \ . \ $+$ T} The above state has 2 productions where each symbol being worked on is in position 2 of their respective strings. These are items in the state having their production configuration of subrule $\otimes$ string position. Sometimes I shall call each entry a {\bf state element} rather than an item. A rule's follow set gets created when it is present in point 1: ie, the state's element is a rule and its follow set is the string to its right that generates teminals. Please see at the beginning of this document the ``follow set'' definition. Point 3's entry symbol identifies the symbol used to gen the state and to quickly help in determining whether two states are equivalent for potential state merging. Point 4 is a requirement to support merging of states from 2 different closured-part state networks. It supplies the lr1 states that have reduce / reduce or shift conditions that require the lr1 compatibility check. When there is a proposed merger from 2 different closured-part contexts, it is the union of their follow sets that gives the reducing subrules their lookahead sets. Consequently the lr1 conflict states of the ``merged into context'' must be evaluated for lr1 compatibility. \fbreak \fbreak Parts of a state:\fbreak \ptindent{1) Closured } \ptindent{2) Transitive } A closured part are all the state's items whose elements start their strings. They have been brought into the state by the ``closure'' operation caused by a state's element being a rule. A transitive part are those productions whose elements are to the right of the start element. Items used to generate a new state are called {\bf ``core items''}. \fbreak \fbreak State generation:\fbreak All states are generated from a closured part of a state. Its productions are walked along their strings producing transitive states until their strings are completely consumed. This holds for the ``Start state'' that starts things off by generating all it transitive children. Thereafter each transitive state is visited and assessed for its closured components that then generates its own transitive states. This goes on until all the generated states have been visited. \fbreak \fbreak Contexts:\fbreak \ptindent{1) follow sets} \ptindent{2) production's reduced lookahead set} A production's reduction occurs when all its string has been recognized. For it to reduce, it depends on the context of its follow set within its birthing closured state: This is the lr1 compatibility context that is refered to as {\bf lookahead}. When there is no conflict of interest between competing productions (reduces with possibly shifts) within the state, this becomes a lr(0) situation. Without regard for the lookahead context this now shifts the error detection to the state that must deal with the lookahead as the current terminal for shifting. This strategy is used when state merges takes place. Instead of exploding the number of states sensitive to only its own lookahead context, mergers combine the follow set contexts as long as there is no state incompatibilities created. 2 or more competing reducing productions requires their follow set contexts to resolve the reducing conflict: reduce / reduce or reduce / shift. Shifts of symbols are local to the reducing state. Of course lookaheads are context sensitive according to each productions birthing states. In LR(1) terms, the lookahead is deterministic and provided by the follow sets having only 1 symbol string as lookahead. \fbreak \fbreak Follow set and right bounded condition:\fbreak This condition is where a rule is the last symbol in its production string. Its closured productions inherit the follow set of the production string(s) that closured it. These follow sets are found in the gening closured state environment. Consequently right bounded closured productions must be gened in case it could produce a conflict state. Why? Merges taking place above this to-be-gened production from a different closured state generation could produce a conflict as the merger is not aware that one of its transitive states has a future conflict condition dependent on these merged follow set contexts.\fbreak As an example please see David Spector from ``SIGPLAN VOL 23 DEC/88'' where my gened ``lr1\_sp5.lex'' grammar illustrates this condition. \fbreak \fbreak Epsilon rules and right bounded condition:\fbreak If the last symbol is a rule and is epsilonable, then the right bounded condition moves left inwards from the end of the symbol string to the next right-to-left symbol. Now if that symbol is a rule it is considered a right bounded requiring generation within the current closured state environment. This is a recursive definition: right bounded condition gens closures having the right bounded condition that also requires immediate generation. When it comes time to gen the closured-part state of the right bounded components, they will have been already gened and their conflict states entered against their gening closured-part state environment. \fbreak \fbreak Significance of right bounded condition:\fbreak It demands that its future closured state generation be associated with the generating closured state that created it. Restated: It must be generated prematurely by its spawning closured state. This way any of its transitive states that have the lr1 conflict condition will get placed in the conflict state list of the generating closured state so that a proposed merger relative to the original closured state is aware of the potential conflict and checked accordingly. \fbreak \fbreak Some synonyms:\fbreak ``Closure-only'' state:\fbreak A state where all its state elements are configurations with their start symbol. This is your one and only Start state.\fbreak \fbreak ``Transitive'' state:\fbreak A state where at least one state element is not the starting symbol of a production's string. \fbreak \fbreak ``Closured-only-part'' of a state:\fbreak All state's elements whose symbols start the subrule string. Synonym: ``closured state''.\fbreak \fbreak ``Closured-only-subrules'' of a state:\fbreak Productions's symbol strings brought into the state by the closure operation caused by a state's element being a rule. The ``closured-only'' part are those subrules birthed within this state to generate all its ``closured-only'' subrules transitive states. \fbreak \fbreak ``Conflict state'':\fbreak A state having at least 2 items where at least one of the productions is reducing. \fbreak \fbreak Building a state core:\fbreak There are only 2 contexts that provide the generation fodder for a state:\fbreak \ptindent{1) Start (closure-only) state --- Start rule's grammar tree definition} \ptindent{2) Transitive states --- generated from a closured state} The ``closured-part'' of a state generates all its transitive states from its closure subrules regardless of the type of state --- Start or transitive. Point 1 starts things off. It generates all its transitive states. Point 2 deals with transitive states from point 1 that have closured-only residues that need generating. Of course these newly generated transitive states could be merged into the existing lr1 state network if they meet the lr1 compatibility criteria. Eventually the newly added transitive states will be assessed for their ``closured-part'' generation. \fbreak \fbreak Some Merge points:\fbreak First, only conflict states are tested. They are supplied by their associated closured generating state. When a merge takes place, the state being absorbed by the older closured network deposits its follow set info against the merged into state. Second, the conflict states of the ``merged into'' state network must also be added to the gening closured state's conflict state list. Why? If the state was not merged, eventually all its gened states would have the equivalent conflict states as the proposed merger. The only refinement to this is conflict states should only be added that are eventually generated from the ``merged into'' state. Now if future mergers are proposed into this newly closured state's network, the conflict states of the absorbing network will also be there for the testing. \fbreak \fbreak In summary, Lr1 state generation is discrete in its generation passes. Pass one: generate all the states for the start state from its ``closured-only'' subrules. Pass two and greater deals with ``closured-only'' parts of transitive states that have not been completely gened. Remember a subrule is associated with its birth state that brought it into existance. These transitive states are of previous passes. Each transitive pass looks for the next transitive state to generate until all its lr state network have been built. The ``transitive state pass'' generates all its ``closured-only'' subrules independently of the past generations. \fbreak \fbreak Now the state implementation bedevils this definition as does Goethic churches --- one usually does not see the infrastructure required to build it unless the project ran out of money and stands unfinished but open to its engineering secrets. So here's the scaffolding for my sanity. A note on the following type defs sections: to make ``cweave'' behave in formatting its the document --- a slight ahem until i debug / correct ``cweave''. The cause is templates that came after the original program was written. @*2 |gen_context| definition/implementation.\fbreak The context identifying the closure state and vector combo gening its states. This context is needed to prevent same closure state merges whose vectors are different but generate common states having different follow sets. If merged the contributing contexts could make it non lr1. See David Spector's paper ``Efficicent Full Lr1 Parser Generators'': G2 example. The context is maintained per state that gened it and per state's subrules vectors: |state_element|. @+= gen_context::gen_context(state* S,Voc_ENO Ve):for_closure_state_(S),gen_vector_(Ve){} @*2 |state_element| definition/implementation.\fbreak Basic building block of a state's set of subrules' string symbols. Laced throughout |state_element| are linkages between the past, present, and future of its lr1 state generation. This is the scaffolding to build the state network. |sr_def_element_| {\bf is for tracing purposes only}. I could have gone the long way by getting the tree node's content and then fetch its definition but this makes life easier when truth telling takes place --- {\bf terminal-def or rule-def}. The |la_set_| only gets created at the end-of-string point. It's a fast way to scratch-pad potential merges and the lr1 breathalyzer test.\fbreak @*3 |state_element| implementation.\fbreak @+= state_element::state_element(AST* Elem) :cs_vector_combo_gening_it_(0,-1) ,sr_element_(Elem) ,sr_def_element_(0) ,its_enum_id_(-1) ,subrule_def_(0) ,closure_state_(0) ,goto_state_(0) ,previous_state_(0) ,reduced_state_(0) ,self_state_(0) ,previous_state_element_(0) ,next_state_element_(0) ,la_set_(0) ,common_la_set_idx_(-1) { @; } @ Determine entry symbol.\fbreak Eases tracing of lr states easier instead of just displaying its enumerated value. @= CAbs_lr1_sym* sym = AST::content(*Elem); Voc_ENO id = sym->enumerated_id__; switch(id){ case T_Enum::T_refered_rule_:{ @; rule_def* rd = rr->its_rule_def(); sr_def_element_ = rd; its_enum_id_ = rd->enum_id(); break; } case T_Enum::T_T_eosubrule_:{ @; sr_def_element_ = eos; its_enum_id_ = eos->enumerated_id__; la_set_ = new LA_SET_type(); break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ @; sr_def_element_ = eos; its_enum_id_ = eos->enumerated_id__; break; } case T_Enum::T_T_called_thread_eosubrule_:{ @; sr_def_element_ = eos; its_enum_id_ = eos->enumerated_id__; break; } case T_Enum::T_refered_T_:{ @; T_terminal_def* td = rt->its_t_def(); sr_def_element_ = td; its_enum_id_ = td->enum_id(); break; } } @*3 |~state_element|.\fbreak @+= state_element::~state_element(){ if (la_set_ != 0) delete la_set_; } @*2 Lookahead Comments.\fbreak Please see the |la_express| grammar as to how it calculates the thread's end-of-parse stream lookahead. The ``eolr'' metaterminal is discussed in length as to what it represents and how it is exploded when lookahead expressions are used in a thread grammar ``parallel-la-boundary'' construct. @*3 |add_fs_setA_to_LA|.\fbreak Substitute ``parallel-reduce-operator for ``parallel-operator'' to eliminate the ambiguity between look ahead for reduction purposes versus calling thread for shift purposes. @+= void state_element::add_fs_setA_to_LA(follow_element& Fe,LA_SET_type& La_to_fill_in){ FOLLOW_SETS_ITER_type i = Fe.follow_set_.begin(); FOLLOW_SETS_ITER_type ie = Fe.follow_set_.end(); for(;i!=ie;++i){ LA_SET_ITER_type j = La_to_fill_in.find(*i); if(j == La_to_fill_in.end()) { T_in_stbl* t_sym = *i; @; } } } @ Is there back to back thread call?.\fbreak Before the thread call reduce was made lr(0), it used the \REDshift meta terminal to reduce the first thread call as the \PARshift operator within a state was ambiguous in the 2 contexts --- run the thread or reduce the called thread. @= if(t_sym->t_def()->enum_id() != LR1_PARALLEL_OPERATOR){ La_to_fill_in.insert(t_sym); }else{ using namespace yacco2_stbl; T_sym_tbl_report_card report_card; find_sym_in_stbl(report_card,*LR1_REDUCE_OPERATOR_LITERAL); @=T_in_stbl* td = (T_in_stbl*)report_card.tbl_entry_->symbol_;@>@/ La_to_fill_in.insert(td); } @*3 |calc_la| --- fill the reduced element's la set by walking follow set graph.\fbreak Fill in the lookahead set for a reduced subrule by walking its follow sets. I protect against merges vs right bounded transitions that could cycle by |VISITED_MERGE_STATES_IN_LA_CALC|. The gened la is checked empty so indicate as bad. @+= bool state_element::calc_la(state_element& La_to_fill_in){ if(La_to_fill_in.la_set_ == NULL){ return false; // no set to fill;so not lr1 } if (La_to_fill_in.reduced_state_ != La_to_fill_in.self_state_) return true; VISITED_MERGE_STATES_IN_LA_CALC.clear(); La_to_fill_in.la_set_->clear(); state* cs = La_to_fill_in.closure_state_; CAbs_lr1_sym* sym = AST::content(*La_to_fill_in.sr_element_); T_ENO id = sym->enumerated_id__; switch(id){ case T_Enum::T_T_eosubrule_:{ @; rule_def* rd = eos->its_rule_def(); RULE_ENO r_id = rd->enum_id(); S_FOLLOW_SETS_ITER_type i = cs->state_s_follow_set_map_.find(r_id); follow_element* fe = i->second; add_fs_setA_to_LA(*fe,*La_to_fill_in.la_set_); if(fe->transitions_.empty() != true){ fill_la_from_transition(La_to_fill_in,fe->transitions_); } if(fe->merges_.empty() != true){ fill_la_from_merge(La_to_fill_in,fe->merges_,r_id); } break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ break; } case T_Enum::T_T_called_thread_eosubrule_:{ break; } } if (La_to_fill_in.la_set_->empty() == true){ return false; // not lr1 as cant have an empty la set }else{ return true; // la set ok } } @*3 |fill_la_from_merge|.\fbreak @+= void state_element:: fill_la_from_merge(state_element& La_to_fill_in,MERGES_type& Merge,RULE_ENO Rule_no){ MERGES_ITER_type i = Merge.begin(); MERGES_ITER_type ie = Merge.end(); for(;i!=ie;++i){ state* cs = *i; STATES_SET_ITER_type ii = VISITED_MERGE_STATES_IN_LA_CALC.find(cs); if(ii != VISITED_MERGE_STATES_IN_LA_CALC.end()) return; VISITED_MERGE_STATES_IN_LA_CALC.insert(cs); S_FOLLOW_SETS_ITER_type i = cs->state_s_follow_set_map_.find(Rule_no); follow_element* fe = i->second; add_fs_setA_to_LA(*fe,*La_to_fill_in.la_set_); if(fe->transitions_.empty() != true){ fill_la_from_transition(La_to_fill_in,fe->transitions_); } if(fe->merges_.empty() != true){ fill_la_from_merge(La_to_fill_in,fe->merges_,fe->rule_no_); } } } @*3 |fill_la_from_transition|.\fbreak @+= void state_element:: fill_la_from_transition(state_element& La_to_fill_in,TRANSITIONS_type& Transition){ TRANSITIONS_ITER_type i = Transition.begin(); TRANSITIONS_ITER_type ie = Transition.end(); for(;i!=ie;++i){ follow_element* fe = *i; add_fs_setA_to_LA(*fe,*La_to_fill_in.la_set_); if(fe->transitions_.empty() != true){ fill_la_from_transition(La_to_fill_in,fe->transitions_); } if(fe->merges_.empty() != true){ fill_la_from_merge(La_to_fill_in,fe->merges_,fe->rule_no_); } } } @*3 |find_state_element_s_rule_no|.\fbreak Used for state merging to calculate a reducing subrule's lookahead set. @+= RULE_ENO state_element::find_state_element_s_rule_no(){ return subrule_def_->its_rule_def()->enum_id(); } @*2 Follow set definition for a rule.\fbreak The input set of strings for the rule's follow set are provided by the state's |S_VECTORS_type| that is a map of 3 generic enumerate types --- ``rule-defs'', ``T-defs'', and ``eosubrule'' variants. Of particular interest in this map are the ``rule-def''s. The |state_element|s associated with the rule are the GPS into each subrule's symbol string. One can view this as the state's contributors list to generate both its lr states and the referenced rules' follow sets for this state. Now these input follow set strings are the strings to its right of its GPS. This is supplied thru the symbol's grammar next brother tree node. @*3 Follow set implementation.\fbreak @+= AST* follow_element::rule_def_t(){return rule_def_t_;} state* follow_element::its_state(){return its_state_;} follow_element::follow_element(state* State) :rule_no_(-1) ,rule_def_t_(0) ,its_state_(State) {} follow_element::follow_element(RULE_ENO Rule_no,state_element& State_elem,AST& Rule_def_t) :rule_no_(Rule_no) ,rule_def_t_(&Rule_def_t) ,its_state_(State_elem.closure_state_) {} void follow_element::add_follow_set_contributor(AST* SR_element){@/ sr_elements_.push_back(SR_element); } @*3 |add_follow_set_transition|.\fbreak Find eosubule's lhs rule. This gives the ``rule no'' to fetch its follow set element from its closure state's follow set map. Remember, epsilon stays within its closured state whilst a reducing subrule {\bf needs to go back to the start of its symbol string} for its spawning rule and hence its follow set.\fbreak @+= void follow_element::add_follow_set_transition(state_element& State_elem,T_eosubrule& Eos){@/ rule_def* rd = Eos.its_rule_def(); RULE_ENO eno = rd->enum_id(); S_FOLLOW_SETS_ITER_type fsi = State_elem.closure_state_->state_s_follow_set_map_.find(eno); if(fsi == State_elem.closure_state_->state_s_follow_set_map_.end()){ return; } follow_element* fe = fsi->second; @; transitions_.push_back(fe); } @ Left recursion on rule check.\fbreak Don't want to cycle on the same state spot: S1.A transitions on S1.A caused by a grammar's rule having left rule recursion. @= if((rule_no_ == eno) && (its_state_->state_no_ == State_elem.closure_state_->state_no_)) return; @ Add the terminal to the follow set. @+= void follow_element::add_T_to_follow_set(AST* Refered_T){ @; follow_set_.insert(t->t_in_stbl()); } @ Get refered-t. @= @=refered_T* t = (refered_T*)AST::content(*Refered_T);@>@/ @*3 |remove_merge_closure_info|. @+= void follow_element::remove_merge_closure_info(){ merges_.pop_front(); } @*3 |add_merge_closure_info|.\fbreak Watch for rule having subrules that are merged with common closure state. U should only have 1 such state in list so throw out duplicates. @+= void follow_element::add_merge_closure_info(state& To_merge_closure_state){ state* tm = &To_merge_closure_state; MERGES_ITER_type i = merges_.begin(); MERGES_ITER_type ie = merges_.end(); for(;i!=ie;++i){ state* s = *i; if(s == tm) return; } merges_.push_front(&To_merge_closure_state); } @*2 State definition/implementation.\fbreak |vectored_into_by_elem_| is the goto element from the spawning state that enters this state. The ``xxx-def'' symbol is provided by |vectored_into_by_elem_sym_| that is used for tracing purposes. It is one of the elements in determining whether 2 states are equal. I use the symbol's defining enumerate value which was enumerated across all the Grammar's vocabulary: Rules and Terminals. |START_STATE_ENUMERATE| symbol representing -1 is used to accommodate a ``closure only'' state where there is no symbol entering the start state as a Grammar's vocabulary enumeration begins at 0. |closure_rule_list_| provides referenced rules in the state to complete the state's elements. |follow_rule_list_| is a fast way to deal with building follow sets for the state as it is a list of rule numbers that are keys into |state_s_to_vector| that indirectly supplies the follow string contexts. To support rules recycling optimization, a quasi closure state for any rule of the grammar has been added. Why the addition? Recycling of rules requires a use count derived from recursion and subrules references to the rule. My first attempt was wrong as i did not take into account that a rhs subrule could have a referenced rule that could be indirectly referenced by (derived by) a suffixed referenced rule. So i need to derive the state containing the closured items and them analyse its content to see whether indirect referencing is taking place. So create ctor of state with no tree and a |closure_only_derives| method. @*3 State's map of ``to vector'' elements.\fbreak |S_VECTORS_type| is the state's map of ``to vector'' elements of ``rule-ref'', ``T-ref'', and |eosubrule|. These elements produce the ``goto'' state eminating out of the lr1 state. This is a white lie as the |eosubrule| eminates nothing. It represents either the epsilon condition if its the first element of a subrule or a fully consumed subrule: its string of symbols has been consumed and so to be reduced. ``rule-ref'', ``T-ref'' are proxies to their definitions whereby their enumerated values are unique. The second part of the map is the list of {\bf same state elements} having identical enumerated keys. These vectors are the fodder to generate the next set of states eminating from this state and all the ``closured-only part'' states progeny. The list is sorted by the AST address inside the state's emlement so that state equivalences can be determined. U might raise the point: doesn't it matter what order the elements are placed inside the state to generate the lr1 state network: FIFO? NO! Let's review why.\fbreak \ptindent{1) ``closured only'' state composed of 1st position only subrules' elements.} \ptindent{2) this state's follow sets are static: first set from strings to rt of refered rules.} \ptindent{3) only the closured-only subrules are fully generated at the same time.} \ptindent{4) transitive states only continue gening their subrules from the closured state.} \ptindent{5) the resulting lr1 states are evaluated for lr1 conflicts.} \ptindent{6) apply the logic above to gened states having incomplete gened closured-only parts.} \fbreak It is point 2 that is interesting: the birthing closured states of its reducing subrules supplies their lookahead. This means the closured state is generated completely before an assessment needs to take place. The lr1 assessment determines whether the gened states are lr(1) compatible. This check goes only against states that have reducing subrules so that the reduce / reduce and shift / reduce conditions can be verified. How is the lr1 condition evaluated? Easy, the reducing subrule's rule within its birthing closured state contains its follow set: ie its lookahead terminals. All it takes is to make sure that the intersection of all the reducing subrules' follow sets is empty and that the state's shift terminals are not in any of the reducing subrules' follow sets. This shift set can be considered an invisible follow set that is applied at the same time to the other reducing follow sets. Keeping a list of conflicting states within the ``closure-only or part'' state when a state merge is proposed allows one to apply this lr1 condition for compatibility against the potential merged follow sets. Remember a gened state is produced out of its closured state. Thus mergers mean use the follow sets of each closure state. The ``state to merge into'' already has it list of lr1 conflict states in its associated gening closured state that need checking before Mr. Goodwrench nods. \fbreak \fbreak Key: element's enumerate: ``rule-def'', ``T-def'', and ``eosubrule'' \fbreak Elements in list: state's elements that contain a grammar's tree node address \fbreak @*2 State implementation.\fbreak @+= @*3 |state(AST* Start_rule_t)|. @+= state::state(AST* Start_rule_t) :cs_vector_combo_gening_it_(0,-1) ,vectored_into_by_elem_(START_STATE_ENUMERATE) ,vectored_into_by_elem_sym_(0) ,state_no_(0) ,closure_state_birthing_it_(0) ,state_type_(0) ,arbitrator_name_(0) { create_start_state(*Start_rule_t); add_state_to_gbl_lr1_state_tbls(this); } @*3 |state()|: for closure only state of derives. @+= state::state() :cs_vector_combo_gening_it_(0,-1) ,vectored_into_by_elem_(START_STATE_ENUMERATE) ,vectored_into_by_elem_sym_(0) ,state_no_(0) ,closure_state_birthing_it_(0) ,state_type_(0) ,arbitrator_name_(0) {} @*3 |state(AST& Vectored_into_id_t)| --- Create transitive state. @+= state::state(Voc_ENO Eno,CAbs_lr1_sym* Entry_sym) :cs_vector_combo_gening_it_(0,-1) ,vectored_into_by_elem_(Eno) ,vectored_into_by_elem_sym_(Entry_sym) ,state_no_(0) ,closure_state_birthing_it_(0) ,state_type_(0) ,arbitrator_name_(0) {} @*3 |closure_only_derives| --- Create a closure only derives state.\fbreak Gen a derives only state for a rule so that the rule's recycle count is correct for indirect references. Used by |rules_use_cnt.lex| grammar. @+= void state::closure_only_derives(AST* Rule_tree){ gen_context gening_context(0,-1); add_rule_s_subrules_to_state(*Rule_tree,gening_context,*this); add_closure_rules_subrules_to_state(gening_context,*this); } @*1 {\bf{Generate states}}. @*3 |add_element_to_state_vector|. \fbreak The |state_s_vector_|[enumerate of element] is positive except when its ``eosubrule''. Why? Cuz i'm (re)cursing on this terminal in 2 ways: as an end-of-subrule condition for a production and in grammars that are referencing it: is this devine? So when it's a real end-of-string situation i make the key negative. When it's a ``refered-T'' then use its contained ``terminal-def'' positive image that could be the containment of ``eosubrule''. The element list is sorted on its AST address. Why the order? This makes sure that 2 elements having been brought into the state by different order will equate when state merges are proposed. @+= void state::add_element_to_state_vector (Voc_ENO Elem_id,state_element& Elem){ S_VECTORS_ITER_type i = state_s_vector_.find(Elem_id); if(i == state_s_vector_.end()){ state_s_vector_[Elem_id]= S_VECTOR_ELEMS_type(); i = state_s_vector_.find(Elem_id); } S_VECTOR_ELEMS_type& el = i->second; if(el.empty() == true){ el.push_back(&Elem); return; } S_VECTOR_ELEMS_ITER_type j = el.begin(); S_VECTOR_ELEMS_ITER_type je = el.end(); for(;j!=je;++j){ state_element* se = *j; if(Elem.sr_element_ < se->sr_element_){ el.insert(j,&Elem); return; } } el.push_back(&Elem); } @*3 |add_closure_rules_subrules_to_state|.\fbreak Why the |closure_rule_list_.end()| in the loop? As i'm adding items to it while iterating thru it, i play it safe by testing each cycle for the end-of-container condition via the function call rather than a local variable set before the iteration. @+= void state::add_closure_rules_subrules_to_state(gen_context& Possible_gen_context,state& Closure_state){ CLOSURE_RULES_type processed_rules_set; loop_until_empty:; if(closure_rule_list_.empty() == true) return; CLOSURE_RULES_ITER_type i = closure_rule_list_.begin(); if(processed_rules_set.find(*i) != processed_rules_set.end()){ closure_rule_list_.erase(*i); goto loop_until_empty; } processed_rules_set.insert(*i); rule_def* rd = (*i)->r_def(); AST* t = rd->rule_s_tree(); add_rule_s_subrules_to_state(*t,Possible_gen_context,Closure_state); goto loop_until_empty; } @*3 |add_rule_to_closure_list|. @+= void state::add_rule_to_closure_list(rule_in_stbl* Rule_in_stbl){ CLOSURE_RULES_ITER_type i = closure_rule_list_.find(Rule_in_stbl); if(i == closure_rule_list_.end()){ closure_rule_list_.insert(Rule_in_stbl); derives_closure_rule_list_.insert(Rule_in_stbl); rule_def* rd = Rule_in_stbl->r_def(); if(rd->closure_rules_making_up_first_set()->empty() == true) return; CLOSURE_RULES_type* cr = rd->closure_rules_making_up_first_set(); CLOSURE_RULES_ITER_type j = cr->begin(); CLOSURE_RULES_ITER_type je = cr->end(); for(;j!=je;++j){ rule_in_stbl* ris = *j; i = closure_rule_list_.find(ris); if(i == closure_rule_list_.end()){ derives_closure_rule_list_.insert(ris); closure_rule_list_.insert(ris); } } } } @*3 |add_rule_s_subrules_to_state|.\fbreak Add rule's productions to state due to closured operation. A wrinkle: if the possible gen context has a gen vector of -1, this means the state's closure elements are not right bounded and are not assiciated with generating context and so its gen context will be out its own state and own vector. @+= void state:: add_rule_s_subrules_to_state(AST& Start_Rule_def_t,gen_context& Possible_gen_context,state& Closure_state_associate_with){ AST* subrules_t = AST::get_1st_son(Start_Rule_def_t); AST* first_element_t(0); CAbs_lr1_sym* first_element(0); Voc_ENO id(START_STATE_ENUMERATE); Voc_ENO cs_id(START_STATE_ENUMERATE); for(;subrules_t !=0;subrules_t = AST::brother(*subrules_t)){ @=T_subrule_def* srd = (T_subrule_def*)AST::content(*subrules_t);@>@/ first_element_t = AST::get_1st_son(*subrules_t); first_element = AST::content(*first_element_t); state_element* se = new state_element(first_element_t); se->subrule_def_ = srd; se->self_state_ = this; se->closure_state_ = this; se->closured_state_gening_it_ = &Closure_state_associate_with; id = first_element->enumerated_id__; switch(id){ case T_Enum::T_refered_rule_:{ @=refered_rule* rr = (refered_rule*)first_element;@>@/ rule_def* rd = rr->its_rule_def(); rule_in_stbl*ris= rr->Rule_in_stbl(); RULE_ENO r_id = rd->enum_id(); cs_id = r_id; se->cs_vector_combo_gening_it_.gen_vector_ = r_id; add_element_to_state_vector(r_id,*se); add_rule_to_follow_list(r_id); add_rule_to_closure_list(ris); break; } case T_Enum::T_T_eosubrule_:{ T_ENO t_id = T_Enum::T_T_eosubrule_; cs_id = t_id; se->reduced_state_ = this; se->cs_vector_combo_gening_it_.gen_vector_ = t_id; add_element_to_state_vector(-t_id,*se); break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ T_ENO t_id = T_Enum::T_T_null_call_thread_eosubrule_; cs_id = t_id; se->reduced_state_ = this; se->cs_vector_combo_gening_it_.gen_vector_ = t_id; add_element_to_state_vector(-t_id,*se); break; } case T_Enum::T_T_called_thread_eosubrule_:{ T_ENO t_id = T_Enum::T_T_called_thread_eosubrule_; cs_id = t_id; se->reduced_state_ = this; se->cs_vector_combo_gening_it_.gen_vector_ = t_id; add_element_to_state_vector(-t_id,*se); break; } case T_Enum::T_refered_T_:{ @=refered_T* rt = (refered_T*)first_element;@>@/ T_terminal_def* td = rt->its_t_def(); T_ENO t_id = td->enum_id(); cs_id = t_id; se->cs_vector_combo_gening_it_.gen_vector_ = t_id; add_element_to_state_vector(t_id,*se); break; } } if(Possible_gen_context.gen_vector_ != -1){ se->cs_vector_combo_gening_it_ = Possible_gen_context; }else{ se->cs_vector_combo_gening_it_.for_closure_state_ = this; se->cs_vector_combo_gening_it_.gen_vector_ = cs_id; } } } @*3 |crt_core_items_of_state|.\fbreak Add subrules to the new state being created. The iterators walk the vector list of the spawning state.\fbreak \fbreak Right bounded check:\fbreak \hbox{\hskip.5in 1) Rx \subrule{} \Alpha . Ra t}\fbreak \hbox{\hskip.5in 2) Ry \subrule{} \Beta . Ra}\fbreak \hbox{\hskip.5in 3) Rz \subrule{} . Ra Rb}\fbreak The period indicates where within the string of the production the current vector is. Point 1 is not right bounded due to t representing a terminal. Point 2 is right bounded as the follow set of its Ra hits the end-of-string condition and so must transition back along to Ry's follow sets. Point 3 could be right bounded if Rb is epsilonable. Cuz of point 2, the newly generated state's closure rules will be associated with the current closure state generation. @+= bool state::crt_core_items_of_state (S_VECTOR_ELEMS_ITER_type& Iter_begin ,S_VECTOR_ELEMS_ITER_type& Iter_end ,gen_context& Gening_context){ bool rt_bnded(false); AST* to_element_t(0); CAbs_lr1_sym* to_element(0); Voc_ENO id(START_STATE_ENUMERATE); for(;Iter_begin !=Iter_end;++Iter_begin){ state_element* from_se = *Iter_begin; to_element_t = AST::brother(*from_se->sr_element_); bool se_rt_bnded_condition = is_str_rt_bnded(to_element_t); if(se_rt_bnded_condition == true) rt_bnded = se_rt_bnded_condition; to_element = AST::content(*to_element_t); switch(to_element->enumerated_id__){ case T_Enum::T_T_null_call_thread_eosubrule_:{//bypass to_element_t = AST::brother(*to_element_t); se_rt_bnded_condition = is_str_rt_bnded(to_element_t); if(se_rt_bnded_condition == true) rt_bnded = se_rt_bnded_condition; to_element = AST::content(*to_element_t); break; } case T_Enum::T_T_called_thread_eosubrule_:{//bypass to_element_t = AST::brother(*to_element_t); se_rt_bnded_condition = is_str_rt_bnded(to_element_t); if(se_rt_bnded_condition == true) rt_bnded = se_rt_bnded_condition; to_element = AST::content(*to_element_t); break; } } state_element* se = new state_element(to_element_t); se->subrule_def_ = from_se->subrule_def_; se->self_state_ = this; se->closure_state_ = from_se->closure_state_; se->closured_state_gening_it_ = Gening_context.for_closure_state_; //|se->cs_vector_combo_gening_it_ = from_se->cs_vector_combo_gening_it_;| se->cs_vector_combo_gening_it_ = Gening_context; if(se_rt_bnded_condition == true){ if(from_se->closured_state_gening_it_ != Gening_context.for_closure_state_){ // common prefix syndrome: keep it pure: same as from se se->cs_vector_combo_gening_it_ = from_se->cs_vector_combo_gening_it_; se->closured_state_gening_it_ = from_se->closured_state_gening_it_; }else{ se->cs_vector_combo_gening_it_ = Gening_context; se->closured_state_gening_it_ = Gening_context.for_closure_state_; } } from_se->goto_state_ = se->self_state_; se->previous_state_ = from_se->self_state_; from_se->next_state_element_ = se; se->previous_state_element_ = from_se; id = to_element->enumerated_id__; @; } return rt_bnded; } @*4 Add subrule's element to the being gened state's vector. @= switch(id){ case T_Enum::T_refered_rule_:{ @=refered_rule* rr = (refered_rule*)to_element;@>@/ rule_def* rd = rr->its_rule_def(); rule_in_stbl*ris= rr->Rule_in_stbl(); RULE_ENO r_id = rd->enum_id(); add_element_to_state_vector(r_id,*se); add_rule_to_follow_list(r_id); add_rule_to_closure_list(ris); break; } case T_Enum::T_T_eosubrule_:{ T_ENO t_id = T_Enum::T_T_eosubrule_; add_element_to_state_vector(-t_id,*se); break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ T_ENO t_id = T_Enum::T_T_null_call_thread_eosubrule_; add_element_to_state_vector(-t_id,*se); break; } case T_Enum::T_T_called_thread_eosubrule_:{ T_ENO t_id = T_Enum::T_T_called_thread_eosubrule_; add_element_to_state_vector(-t_id,*se); break; } case T_Enum::T_refered_T_:{ @=refered_T* rt = (refered_T*)to_element;@>@/ T_terminal_def* td = rt->its_t_def(); T_ENO t_id = td->enum_id(); add_element_to_state_vector(t_id,*se); break; } } @*3 |create_start_state| --- Create start state. @+= void state::create_start_state(AST& Start_rule_t){ gen_context gening_context(0,-1); add_rule_s_subrules_to_state(Start_rule_t,gening_context,*this); add_closure_rules_subrules_to_state(gening_context,*this); crt_start_rule_s_follow_set(Start_rule_t); create_follow_sets_of_state(); this->state_type_ = determine_reduced_state_type(this); //|Print_dump_state(this);| } @*3 |gen_transitive_states_for_closure_context|.\fbreak Loop thru the state's closure/vector where the subrule's start position is the first symbol. The loop depends on the gening closure state/vector. When a state is constructed, the closured rules are associated with either:\fbreak \ptindent{1) itself state/vector} \ptindent{2) a right bounded rule context that associates with the gening context creating the state} @+= bool state:: gen_transitive_states_for_closure_context@/ (gen_context& For_gening_context,state& For_closure_state,state& State){@/ @; lrclog << "gen_transitive_states_for_closure_context for closure state: " << For_gening_context.for_closure_state_->state_no_ << endl; S_VECTORS_ITER_type i = State.state_s_vector_.begin(); S_VECTORS_ITER_type ie = State.state_s_vector_.end(); for(;i !=ie;++i){// read state's goto symbols Voc_ENO eno = i->first; @; For_gening_context.gen_vector_ = eno; @; lrclog << " for vector: " << For_gening_context.gen_vector_ << endl; bool continue_gening = gen_a_state(For_gening_context,For_closure_state,State,i); if(continue_gening == NOT_LR1_COMPATIBLE){ @; return NOT_LR1_COMPATIBLE; } } @; return LR1_COMPATIBLE;// gened states ok } @*4 Unchain my reduced states if end-of-subrule and continue to next item.\fbreak Rip thru the subrule's symbols depositing its reduced state as eyeball info. @= if(eno ==-T_Enum::T_T_eosubrule_) { S_VECTOR_ELEMS_type elem_list = i->second; S_VECTOR_ELEMS_ITER_type j = elem_list.begin(); S_VECTOR_ELEMS_ITER_type je = elem_list.end(); for(;j!=je;++j){ state_element* se = *j; se->reduced_state_ = se->self_state_; state* reduced_state = se->self_state_; while (se->previous_state_element_ != 0){ se->previous_state_element_->reduced_state_ = reduced_state; se = se->previous_state_element_; } } continue; // eosubrule: onto next vector to gen in closure state's vector loop } @*3 |gen_transitive_states_balance_for_closure_vector|. \fbreak @+= bool state:: gen_transitive_states_balance_for_closure_vector@/ (gen_context& Gen_context,state& For_closure_state,state& Goto_state){@/ @; lrclog << "gen_transitive_states_balance_for_closure_vector for <" << Gen_context.for_closure_state_->state_no_ << "," << Gen_context.gen_vector_ << "> goto state: " << Goto_state.state_no_ << endl; S_VECTORS_ITER_type i = Goto_state.state_s_vector_.begin(); S_VECTORS_ITER_type ie = Goto_state.state_s_vector_.end(); for(;i !=ie;++i){// read state's goto symbols Voc_ENO eno = i->first; @; bool continue_gening = gen_a_state(Gen_context,For_closure_state,Goto_state,i); if(continue_gening == NOT_LR1_COMPATIBLE){ @; return NOT_LR1_COMPATIBLE; // stop gening as not lr1 } } @; return LR1_COMPATIBLE; } @*3 |gen_a_state|.\fbreak The only wrinkle is when a previous closured state has dragged along a common prefix subrule that is not part of its productions being gened. Call this a premature production generation: the production could be partially gened up to where the common prefix differs or the production completely gened as its right-hand-side string of symbols was contained in the other closure state gened productions. When this premature production's closure state is finally generating all its productions, common conflict states from a past closure generation that include premature gened productions must also be assessed for state conflicts. There is a commonality between the conflict states per gened closure states due to their productions that contributed to this conflict state list and so these common states must be added to the premature closure state's conflict state list. The premature gened production could be reducing whilst its common production that dragged it along could be shifting or reducing. The reverse could also be happening: the being gened production could be reducing while the premature production shifting. Thus a proposed merge into a state of this closure state must also have these common conflict states assessed for lr1ness. A good example of this is situation is Deremer and Pennello's paper on ``Efficient computation of lalr(1) lookahead sets'' ACM Transactions on Programming Language and Systems: Vol. 4 no. 4 October 1982 Page 632. Please see my gened grammar ``lalr\_dp1.lex'' illustrating this.\fbreak \fbreak Pathological grammar condition:\fbreak Not lr1 grammar where the gening closure state network is not lr1. So must flag the |gen_a_state| with a returned result: continue or stop. To stop infinite looping within its own closure state / vector being gened, make sure new state being added is lr1 compatible! Instead of analysing the gened network for lr1 compatibility after it is built, do a compatibilty test while it is being built. This stops the infinite looping context!\fbreak \fbreak There are 2 not lr1 compatibilty contexts:\fbreak \ptindent{1) can a state be merged into another closure state's network} \ptindent{2) can not merge and its is not a kosher lr(1) grammar} Point 1) tests the merge for incompatibility and rejects the merge. This does not mean that the grammar is incompatible but that the merged contexts make it incompatible. Point 2) is a state within its own closure state/vector environment which is not okay. @+= bool state::gen_a_state (gen_context& For_gening_context ,state& For_closure_state ,state& Requesting_state,S_VECTORS_ITER_type& Elem_iter){ @; lrclog << "gen_a_state for <" << For_gening_context.for_closure_state_->state_no_ << "," << For_gening_context.gen_vector_ << "> requesting state: " << Requesting_state.state_no_ << endl; gen_context associated_rt_bnded_cs(0,-1); Voc_ENO eno = Elem_iter->first; S_VECTOR_ELEMS_type elem_list = Elem_iter->second; S_VECTOR_ELEMS_ITER_type i = elem_list.begin(); S_VECTOR_ELEMS_ITER_type ie = elem_list.end(); bool compatible(false); for(;i !=ie;++i){// read symbol's element list to gen state_element* se = *i; @; @; @; @; add_state_to_gbl_lr1_state_tbls(s); @; lrclog << "gen_a_state for <" << For_gening_context.for_closure_state_->state_no_ << "," << For_gening_context.gen_vector_ << "> requesting state: " << Requesting_state.state_no_ << " NEW STATE CREATED: " << s->state_no_ << endl; add_state_to_conflict_states_list_if(For_gening_context,*s); compatible= is_state_lr1_compatible(*s);//is new state to be added lr1 compatible? if(compatible==NOT_LR1_COMPATIBLE){// added @; return NOT_LR1_COMPATIBLE;// stop gening: note state added before test as if not lr1 reports properly why } //|Print_dump_state(s);| s->state_type_ = determine_reduced_state_type(s); bool gen_ok = gen_transitive_states_balance_for_closure_vector(For_gening_context,For_closure_state,*s); @; return gen_ok;// state gened so finished going thru element list } @; return LR1_COMPATIBLE; } @*4 Is element vector associated with the current closure state being gened?. @= if (( (se->cs_vector_combo_gening_it_.for_closure_state_ == For_gening_context.for_closure_state_) && (se->cs_vector_combo_gening_it_.gen_vector_ == For_gening_context.gen_vector_) ) != true){ @; lrclog << "gen_a_state Bypass subrule as its gening <" << se->cs_vector_combo_gening_it_.for_closure_state_->state_no_ << "," << se->cs_vector_combo_gening_it_.gen_vector_ << "> different then gening <" << For_gening_context.for_closure_state_->state_no_ << "," << For_gening_context.gen_vector_ << ">" << endl; continue; } @*4 Is element gened from common prefix of an earlier closure state gen?. @= if(se->goto_state_ != 0){ @; lrclog << "gen_a_state subrule COMMON PREFIX state gened by a different gening context. <" << se->cs_vector_combo_gening_it_.for_closure_state_->state_no_ << "," << se->cs_vector_combo_gening_it_.gen_vector_ << "> goto state: " << se->goto_state_->state_no_ << endl; add_state_to_conflict_states_list_if(For_gening_context,*se->goto_state_); bool gen_ok = gen_transitive_states_balance_for_closure_vector(For_gening_context,For_closure_state,*se->goto_state_); @; return gen_ok; } @*4 Create a new state. @= state* s = new state(eno,se->sr_def_element_); s->closure_state_birthing_it_ = For_gening_context.for_closure_state_; s->cs_vector_combo_gening_it_ = For_gening_context; S_VECTOR_ELEMS_ITER_type j = elem_list.begin(); S_VECTOR_ELEMS_ITER_type je = elem_list.end(); bool rt_bnded = s->crt_core_items_of_state(j,je,For_gening_context); if(rt_bnded == true){ associated_rt_bnded_cs = For_gening_context; }else{ associated_rt_bnded_cs.for_closure_state_=0; associated_rt_bnded_cs.gen_vector_=-1; } s->add_closure_rules_subrules_to_state(associated_rt_bnded_cs,*s); s->create_follow_sets_of_state(); @*4 Can new state be merged into state network? Yes then exit.\fbreak Watch for indicator to stop gening the states caused by ``not lr1 compatibile'' while gening the closure state network. @= int compatibility_result = find_2_states_compatible_and_merge(*s); switch(compatibility_result){ case MERGED:{ delete s; @; return true;// keep gening } case ABORT_GENING_STATES:{ @; return NOT_LR1_COMPATIBLE;// stop gening } case NOT_MERGED:{ break; // continue the |gen_a_state| logic by fall through } } @*3 |determine_reduced_state_type|.\fbreak Determines the ``lrness'' of the state: no conflict --- shift(s) or reduce(s) only, conflict: shift / reduce, multiple reduces, shift(s) with mutiple reduces. @+= int state::determine_reduced_state_type(state* S){ // rtned 0 so, 1 ro, 2 s/r, 3 r2 , 4 s/r2 using namespace NS_yacco2_T_enum; int no_reduces(0); int no_reduce_types(0); S_VECTORS_ITER_type svi = S->state_s_vector_.begin(); S_VECTORS_ITER_type svie = S->state_s_vector_.end(); S_VECTORS_ITER_type tvi = S->state_s_vector_.find(-T_Enum::T_T_eosubrule_); if(tvi != svie){ no_reduces += tvi->second.size(); ++no_reduce_types; } tvi = S->state_s_vector_.find(-T_Enum::T_T_called_thread_eosubrule_); if(tvi != svie){ no_reduces += tvi->second.size(); ++no_reduce_types; } tvi = S->state_s_vector_.find(-T_Enum::T_T_null_call_thread_eosubrule_); if(tvi != svie){ no_reduces += tvi->second.size(); ++no_reduce_types; } if(no_reduces > 1) no_reduces = 3; if(S->state_s_vector_.size()>no_reduce_types){// shift present if(no_reduce_types > 0){// combo ++no_reduces;// combo shift / reduce } } return no_reduces; } @*3 |add_state_to_gbl_lr1_state_tbls|.\fbreak @+= void state::add_state_to_gbl_lr1_state_tbls(state* State){ ++NO_LR1_STATES; State->state_no_ = NO_LR1_STATES; LR1_STATES.push_back(State); Voc_ENO eno = State->vectored_into_by_elem_; LR1_STATES_ITER_type i = LR1_COMMON_STATES.find(eno); if(i == LR1_COMMON_STATES.end()){ LR1_COMMON_STATES[eno] = STATES_type(); i = LR1_COMMON_STATES.find(eno); } i->second.push_back(State); } @*3 |add_state_to_conflict_states_list_if|.\fbreak If newly created state has ``eosubrule'' in its vector map, there are 2 possibilities that make it a conflict state:\fbreak \ptindent{1) reduce / reduce --- more than 1 production whose string is consumned} \ptindent{2) shift / reduce --- one reduce with a shift} @+= void state::add_state_to_conflict_states_list_if(gen_context& Gening_context,state& State){ S_VECTORS_ITER_type i = State.state_s_vector_.find(-T_Enum::T_T_eosubrule_); if(i != State.state_s_vector_.end()) goto reduce_fnd; return; reduce_fnd:; if(i->second.size() > 1){// reduce / reduce Gening_context.for_closure_state_->state_s_conflict_state_list_.push_back(&State); return; } if(State.state_s_vector_.size() < 2) return; Gening_context.for_closure_state_->state_s_conflict_state_list_.push_back(&State); } @*1 {\bf{General routines on state compatibilities}}. @*2 Determining if 2 states are equivalent?.\fbreak As an aid to quickly determine whether 2 states are equal, each state except the ``closure-state'' has an element going into it: proxies ``rule-ref'' or ``T-ref'' are references to their definitions. They are general classifications of the items whose contents refer to the specific definition. ``eosubrule'' is excluded from this as it does not generate any states. Why use the definitions? I need a unique identifier and this only comes from the definition. The ``closure-only'' (start) state has no entry. State equivalence is arrived at by:\fbreak \ptindent{1) the ``entered into'' element generating each state must be identical} \ptindent{2) State's A ``to vector'' map must be the same size as B} \ptindent{3) A's ``to vector''\ 's keys must be the same as B} \ptindent{4) A's ``to vector''\ 's state element list's contents must be the same as B} Point 4: why the state's element list ordered? One can have a state whereby its elements order are not the same but the ``to vector'' result is. Give me a real example as i'm a doubter.\fbreak \settabs\+\indent&1in\qquad&\qquad&\qquad&\cr \+&\it Rule&&{\it subrule's symbols}\cr \+&Rab &$\rightarrow$ &a b\cr \+&Rac &$\rightarrow$ &a c\cr \+&Rad &$\rightarrow$ &Rab\cr \+&&$\rightarrow$ &Rac\cr \+&Rae &$\rightarrow$ &Rac\cr \+&&$\rightarrow$ &Rab\cr Rad or Rae inside other productions should allow the merge of Rab, Rac.\fbreak \fbreak Constraints:\fbreak \ptindent{1) state's vector is a map ordered by Voc\_ENO} \ptindent{2) state elements list ordered by tree address} @+= bool state::are_states_equivalent(state& Merge_into_state,state& To_merge_state){ if(Merge_into_state.vectored_into_by_elem_ != To_merge_state.vectored_into_by_elem_){ return false; } if(Merge_into_state.state_s_vector_.size() != To_merge_state.state_s_vector_.size()){ return false; } S_VECTORS_ITER_type i = Merge_into_state.state_s_vector_.begin(); S_VECTORS_ITER_type ie = Merge_into_state.state_s_vector_.end(); S_VECTORS_ITER_type j = To_merge_state.state_s_vector_.begin(); S_VECTORS_ITER_type je = To_merge_state.state_s_vector_.end(); for(;i!=ie;++i,++j){ Voc_ENO ieno = i->first; Voc_ENO jeno = j->first; if(ieno != jeno) { return false; } if(i->second.size() != j->second.size()){ return false; } S_VECTOR_ELEMS_ITER_type l = i->second.begin(); S_VECTOR_ELEMS_ITER_type le = i->second.end(); S_VECTOR_ELEMS_ITER_type m = j->second.begin(); S_VECTOR_ELEMS_ITER_type me = j->second.end(); for(;l!=le;++l,++m){ state_element* ls = *l; state_element* ms = *m; if(ls->sr_element_ != ms->sr_element_){ return false; } } } return true; } @*2 |is_state_lr1_compatible|.\fbreak \ptindent{0) check if it's a s/r or r/r type state no exit as compatible} \ptindent{1) Fill in its lookahead per reducing subrule} \ptindent{1.5) make sure that its reducing set is not empty! or not lr1 compatible} \ptindent{2) calculate state's T shift set} \ptindent{3) do a set intersection on all these sets} The resulting set must be empty to be LR1 compatible. To be more efficient i use a T count as i do not have to report on what type of non compatibility produced it nor between whom: reduce / reduce or shift /reduce. The cost is to read each set while being gened and add up each item's referenced count: not the combinatorics between each set. Add check on eolr presence: Eg, eolr in la expression only, it is not exploded. This leads to the bug that a shift/reduce is incompatible. Somehow i must have optimized this out from |la_expr| grammar. So if there are 2 or more reduces taking place and eolr is present -> not lr1. Same goes for shift/reduce condition. @+= bool state::is_state_lr1_compatible(state& State_to_eval){ T_COUNT_type t_cnt(START_OF_RULES_ENUM); for(int x=0;xsecond.begin(); S_VECTOR_ELEMS_ITER_type je = i->second.end(); bool T_not_meta(false); for(;j!=je;++j){// fill in lookahead per reducing subrule ++no_reduces; state_element* se = *j; if (se->calc_la(*se) == false){ return NOT_LR1_COMPATIBLE; } LA_SET_ITER_type k = se->la_set_->begin(); LA_SET_ITER_type ke = se->la_set_->end(); for(;k!=ke;++k){ T_in_stbl* tintbl = *k; T_ENO teno = tintbl->t_def()->enum_id();// T of new alphabet if(teno <= END_OF_LR1_DEFS){ if(teno == LR1_EOG){ T_not_meta = true; } }else{ T_not_meta = true; } ++t_cnt[teno]; if(t_cnt[teno] > 1) return NOT_LR1_COMPATIBLE; } } if(no_reduces > 1){// reduces > 1 where eolr super set to others if(T_not_meta == true){ if(t_cnt[LR1_EOLR] > 0) return NOT_LR1_COMPATIBLE; } } // shift / reduce evaluation // shift type not of meta Tes \QUEshift, \ALLshift etc S_VECTORS_ITER_type l = State_to_eval.state_s_vector_.begin(); S_VECTORS_ITER_type le = State_to_eval.state_s_vector_.end(); for(;l!=le;++l){// T shift set of state Voc_ENO t = l->first; if(t == -T_Enum::T_T_eosubrule_) continue; if(t == -T_Enum::T_T_null_call_thread_eosubrule_) continue; if(t == -T_Enum::T_T_called_thread_eosubrule_) continue; if(t == T_Enum::T_refered_rule_) continue; S_VECTOR_ELEMS_ITER_type m = l->second.begin(); state_element* se = *m; CAbs_lr1_sym* sym = AST::content(*se->sr_element_); T_ENO tid = sym->enumerated_id__; switch(tid){ case T_Enum::T_refered_T_:{ @; T_in_stbl* tintbl = rt->t_in_stbl(); T_ENO teno = tintbl->t_def()->enum_id();// T of new alphabet if(teno <= END_OF_LR1_DEFS){ if(teno != LR1_EOG){ continue; // bypass the meta Tes } } ++t_cnt[teno]; if(t_cnt[teno] > 1) return NOT_LR1_COMPATIBLE; if(teno>END_OF_LR1_DEFS){// specific shift against eolr usuage if(t_cnt[LR1_EOLR]> 0)return NOT_LR1_COMPATIBLE; } } } } return LR1_COMPATIBLE; } @*2 |are_2_states_compatible_yes_merge|.\fbreak Is it LR1 compatible? If |To_merge_into_state|'s closured-generating state has no conflict states then it's straight sailing. To determine if the mariage is compatible, the conflict states must be checked with the new follow set info supplied by |State_for_merging| birthing ``closured context''. Let's review the LR1 conditions:\fbreak \ptindent{1) a conflict state has at least a reduce / reduce or reduce / shift condition} \ptindent{2) reduced lookaheads must be regened with the new closured follow set context} Point 2's ``new closured follow set'' context comes from the closured states of |State_for_merging|'s productions. To do the check i must merge the new follow set contexts of |To_merge_into_state|'s birth state per transitive production. Why? These are the potential follow set contexts where each production was born. Now generate the lookahead for each conflict state's reduces and then see whether all the conflict states are kosher. An incompatibility just requires rolling back the new follow set context from each of the proposed production's birthing states of |To_merge_into_state|. A kosher merge requires that the spawning state must unlink each of its spawned productions to the new state's ``goto'' and ``reduce'' links and latch into the merged state's productions linkages. \fbreak \fbreak A caveat: {\bf {infinite states gened when a merge into its own closure state network is not lr(1)}}.\fbreak Spector's {\bf {/usr/local/yacco2/qa/lr1\_sp2.lex}} grammar illustrates when the closure state 2's start gening vector:{\bf{x}} produces its own states. Now state 2's brethern vector: {\bf{y}} gets gened and has the potential to merge into one of x's states but the potential merge is not lr(1). Cuz state's 2 two closure state's vectors are different(x,y), this means that their gened states can be separate as long as the lr(1) constraint is respected. So y's states are separate from vectored:x and not merged into x's states due to lr(1) constraint. So continue gening closure state 2's y's states. If down the road gening y's states are not lr(1) then abort the generation and issue such a message. The not lr(1) condition is specific to the closure state's gening vector and not due to a merge into another's state network that is kosher. What happens when a closure state's vector is gened and one of its states is not lr(1) whether being merged into or not within itsself states network? {\bf{/usr/local/yacc2/qa/knu1\_sick.lex}} grammar illustrates this situation. Have a read of its pdf document, as it talks about the situation and how it came about. This situation is a legitimate non lr(1) grammar and so put on the brakes to generating its states. Another example is: {\bf{lr1\_sp6.lex}} where the start rule has a subrule (production) and it is epsilonable. Ditto on the reading.\fbreak \fbreak So the terminating condition is:\fbreak 1) not lr(1) within its own closure state's vector generated states\fbreak @+= int state:: are_2_states_compatible_yes_merge(state& To_merge_into_state,state& State_for_merging){ state* cs_To_merge_into = To_merge_into_state.closure_state_birthing_it_; state* cs_for_merging = State_for_merging.closure_state_birthing_it_; @; bool compatible(false); if(cs_To_merge_into->state_s_conflict_state_list_.empty() == true) goto merged; lr1_test:@/ { S_CONFLICT_STATES_ITER_type k = cs_To_merge_into->state_s_conflict_state_list_.begin(); S_CONFLICT_STATES_ITER_type ke = cs_To_merge_into->state_s_conflict_state_list_.end(); for(;k!=ke;++k){// evaluate lr1 compatibility state* s = *k; compatible = s->is_state_lr1_compatible(*s); if(compatible == NOT_LR1_COMPATIBLE) goto unwind_merge; } } merged:@/ { @; return MERGED; } unwind_merge:{@/ unwind:;@/ @; return NOT_MERGED; } return NOT_MERGED; } @*4 Add potential follow set context per production.\fbreak Only work with the core items. @= RULE_NOS_SET_type rules_to_add; S_VECTORS_ITER_type sfmi = To_merge_into_state.state_s_vector_.begin(); S_VECTORS_ITER_type sfmie = To_merge_into_state.state_s_vector_.end(); S_VECTORS_ITER_type msfmi = State_for_merging.state_s_vector_.begin(); for(;sfmi!=sfmie;++sfmi,++msfmi){ S_VECTOR_ELEMS_ITER_type rri = sfmi->second.begin(); S_VECTOR_ELEMS_ITER_type mrri = msfmi->second.begin(); S_VECTOR_ELEMS_ITER_type rrie = sfmi->second.end(); for(;rri!=rrie;++rri,++mrri){ state_element* re = *rri; state_element* mre = *mrri; if(re->closure_state_ == re->self_state_) continue;// not a core item RULE_ENO ruleno = re->find_state_element_s_rule_no(); S_FOLLOW_SETS_ITER_type j = re->closure_state_->state_s_follow_set_map_.find(ruleno); follow_element* fe = j->second; RULE_NOS_SET_ITER_type rni = rules_to_add.find(ruleno); if(rni == rules_to_add.end()){ fe->add_merge_closure_info(*mre->closure_state_); rules_to_add.insert(ruleno); } } } @*4 Relink spawning state of merged state. @= S_VECTORS_ITER_type ri = State_for_merging.state_s_vector_.begin(); S_VECTORS_ITER_type rie = State_for_merging.state_s_vector_.end(); S_VECTORS_ITER_type si = To_merge_into_state.state_s_vector_.begin(); for(;ri!=rie;++ri,++si){ S_VECTOR_ELEMS_ITER_type rri = ri->second.begin(); S_VECTOR_ELEMS_ITER_type rrie = ri->second.end(); S_VECTOR_ELEMS_ITER_type ssi = si->second.begin(); for(;rri!=rrie;++rri,++ssi){ state_element* re = *rri; state_element* se = *ssi; if(re->previous_state_element_ == 0) continue; state_element* prev_re = re->previous_state_element_; prev_re->goto_state_ = se->self_state_; prev_re->reduced_state_ = se->reduced_state_; prev_re->next_state_element_ = se; // walk thru the rhs backwards laying those reduce eggs for(prev_re = prev_re->previous_state_element_; prev_re != 0;prev_re = prev_re->previous_state_element_){// deposit reducing state prev_re->reduced_state_ = se->reduced_state_; } } } @; @*4 Add conflict states to merge network.\fbreak For now add all the conflict states until i refine a map of states that derives each specific conflict state. If the merged state is part of the current network being gened then bypass. Why? The conflict states are already registered that appliy to this merged state cuz its part of this closured generating state containing the conflist state list. @= if(cs_for_merging->cs_vector_combo_gening_it_.for_closure_state_ != cs_To_merge_into->cs_vector_combo_gening_it_.for_closure_state_){ S_CONFLICT_STATES_ITER_type k = cs_To_merge_into->state_s_conflict_state_list_.begin(); S_CONFLICT_STATES_ITER_type ke = cs_To_merge_into->state_s_conflict_state_list_.end(); for(;k!=ke;++k){ state* s = *k; cs_for_merging->state_s_conflict_state_list_.push_back(s); } } @*4 Unwind potential merge. @= RULE_NOS_SET_type rules_to_add; S_VECTORS_ITER_type sfmi = To_merge_into_state.state_s_vector_.begin(); S_VECTORS_ITER_type sfmie = To_merge_into_state.state_s_vector_.end(); for(;sfmi!=sfmie;++sfmi){ S_VECTOR_ELEMS_ITER_type rri = sfmi->second.begin(); S_VECTOR_ELEMS_ITER_type rrie = sfmi->second.end(); for(;rri!=rrie;++rri){ state_element* re = *rri; if(re->closure_state_ == re->self_state_) continue;// not a core item RULE_ENO ruleno = re->find_state_element_s_rule_no(); S_FOLLOW_SETS_ITER_type j = re->closure_state_->state_s_follow_set_map_.find(ruleno); follow_element* fe = j->second; RULE_NOS_SET_ITER_type rni = rules_to_add.find(ruleno); if(rni == rules_to_add.end()){ fe->remove_merge_closure_info(); rules_to_add.insert(ruleno); } } } @*2 |find_2_states_compatible_and_merge|.\fbreak Read the |LR1_COMMON_STATES| table looking for states with the same enumerate and same state core items. @+= int state::find_2_states_compatible_and_merge(state& State_for_merging){ Voc_ENO eno = State_for_merging.vectored_into_by_elem_; LR1_STATES_ITER_type i = LR1_COMMON_STATES.find(eno); if(i == LR1_COMMON_STATES.end()){ return NOT_MERGED; } STATES_ITER_type j = i->second.begin(); STATES_ITER_type je = i->second.end(); for(;j!=je;++j){ state* s = *j; bool equivalent = are_states_equivalent(State_for_merging,*s); if(equivalent == false) continue; int compatible = are_2_states_compatible_yes_merge(*s,State_for_merging); if(compatible == LR1_COMPATIBLE) return MERGED; if(compatible == ABORT_GENING_STATES) return ABORT_GENING_STATES; } return NOT_MERGED; } @*2 |are_gened_states_lr1_compatible|.\fbreak Read gening state's conflict states for lr1 health check. @+= bool state::are_gened_states_lr1_compatible(){ S_CONFLICT_STATES_ITER_type i = state_s_conflict_state_list_.begin(); S_CONFLICT_STATES_ITER_type ie = state_s_conflict_state_list_.end(); if(i == ie) return LR1_COMPATIBLE; for(;i!=ie;++i){ state* s = *i; if(is_state_lr1_compatible(*s) == NOT_LR1_COMPATIBLE) return NOT_LR1_COMPATIBLE; } return LR1_COMPATIBLE; } @*2 |is_str_rt_bnded|.\fbreak Determine if the production's lookahead transitions thru other closured state's follow sets.\fbreak Read the production's string positioned by it's passed parameter. The right bounded condition is whether the last symbol in the string is a rule before the ``eosubrule''. This demands that the closure productions caused by this rule within the state must also be generated as part of the being gened closure state. A subtle condition arises when the string's balance of symbols contains only rules that are all epsilonable. This moves the right bounded condition into the interior of the production currently positioned. Why? Epsilon is a window that allows one to see past its contents into the next adjacent symbol. This is inductive as it moves right thru all the epsilon rules to the end-of-string. Thus each rule's follow set strings transitizes rightward along the production's string to its end and then up the follow sets of its spawning closure rule's environment.\fbreak \fbreak Please note, the thread call variants do not support right-bounded expressions. U'll never have a rule following the thread call prase --- this is a parsing error. @+= bool state::is_str_rt_bnded(AST* Str){ AST* rstr_t = AST::brother(*Str);// get next node if(rstr_t == 0) return false;// eosubule passed in for verification CAbs_lr1_sym* sym = AST::content(*rstr_t); Voc_ENO id = sym->enumerated_id__; if(id == T_Enum::T_T_eosubrule_){ CAbs_lr1_sym* rsym = AST::content(*Str); if(rsym->enumerated_id__ == T_Enum::T_refered_rule_){ return true; } } return is_str_epsilonable(rstr_t); } @*2 |is_str_epsilonable|.\fbreak The symbol string passed is one to the right of the current item within the state called the ``follow string''. From this point within the production string, the balance of the symbols must be composed of ``refered-rule'' having the epsilon attribute. Remember: the grammar is internally represented by a tree where each node of a production contains a ``refered-rule'', ``refered-T'', or an ``eosubrule''. Thread calls is not applicable. It is never epsilonable. @+= bool state::is_str_epsilonable(AST* Str){ AST* str_t = Str; for(;str_t != 0;str_t = AST::brother(*str_t)){ CAbs_lr1_sym* sym = AST::content(*str_t); Voc_ENO id = sym->enumerated_id__; @; } return true; } @*3 Is str's element epsilon?.\fbreak The tree node's symbol must be a rule or ``eosubrule''. ``eosubrule'' indicates either the empty string (epsilon) where it is the only symbol within the production or the end of the production's string of symbols. @= switch(id){ case T_Enum::T_refered_rule_:{ @; rule_def* rd = rr->its_rule_def(); if(rd->epsilon() != true) return false; break; } case T_Enum::T_T_eosubrule_:{ return true; } case T_Enum::T_refered_T_:{ return false; } } @*1 {\bf{Follow set routines}}. @*2 |add_rule_to_follow_list|. @+= void state::add_rule_to_follow_list(RULE_ENO Refered_rule){ FOLLOW_RULES_ITER_type i = follow_rule_list_.find(Refered_rule); if(i == follow_rule_list_.end()){ follow_rule_list_.insert(Refered_rule); } } @*2 |create_follow_sets_of_state|.\fbreak The list contains the rule's enumerate value. This is used to get the |state_s_vector_[rule no]| context that makes up the state. It supplies the context in the subrule(s) where it resides. The follow set is calculated from the symbol string(s) to the right of these subrule context. @+= void state::create_follow_sets_of_state(){ if(follow_rule_list_.empty() == true) return; RULE_ENO eno(-1); FOLLOW_RULES_ITER_type i = follow_rule_list_.begin(); FOLLOW_RULES_ITER_type ie = follow_rule_list_.end(); for(;i!=ie;++i){// create initial follow set map entries follow_element* fe = new follow_element(this); eno = *i; fe->rule_no_ = eno; state_s_follow_set_map_[eno] = fe; } i = follow_rule_list_.begin(); for(;i!=ie;++i){// read rules eno = *i; S_VECTORS_ITER_type j = state_s_vector_.find(eno); S_FOLLOW_SETS_ITER_type fsi = state_s_follow_set_map_.find(eno); follow_element* fe = fsi->second; S_VECTOR_ELEMS_ITER_type k = j->second.begin(); S_VECTOR_ELEMS_ITER_type ke = j->second.end(); for(;k!=ke;++k){// read subrule context state_element* se = *k; @; AST* follow_str_t = AST::brother(*se->sr_element_); deal_with_follow_str_sym:; fe->rule_def_t_ = rr->its_rule_def()->rule_s_tree(); fe->add_follow_set_contributor(follow_str_t); CAbs_lr1_sym* sym = AST::content(*follow_str_t); Voc_ENO id = sym->enumerated_id__; @; } } } @ @= switch(id){ case T_Enum::T_refered_rule_:{ @; rule_def* rd = rr->its_rule_def(); if(rd->first_set()->empty() != true){ FIRST_SET_ITER_type a = rd->first_set()->begin(); FIRST_SET_ITER_type ae = rd->first_set()->end(); for(;a!=ae;++a){ T_in_stbl* t = *a; if(fe->follow_set_.find(t)== fe->follow_set_.end()){ fe->follow_set_.insert(t); } } } if(rd->epsilon() == true){// next follow str symbol follow_str_t = AST::brother(*follow_str_t); goto deal_with_follow_str_sym; } break; } case T_Enum::T_T_eosubrule_:{ @; fe->add_follow_set_transition(*se,*eos); break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ break; } case T_Enum::T_T_called_thread_eosubrule_:{ break; } case T_Enum::T_refered_T_:{ @; fe->follow_set_.insert(rt->t_in_stbl()); break; } } @ @= @=refered_rule* rr = (refered_rule*)AST::content(*se->sr_element_);@>@/ @ @= @=refered_rule* rr = (refered_rule*)sym;@>@/ @ @= @=refered_T* rt = (refered_T*)sym;@>@/ @ @= @=T_eosubrule* eos = (T_eosubrule*)sym;@>@/ @ @= @=T_null_call_thread_eosubrule* eos = (T_null_call_thread_eosubrule*)sym;@>@/ @ @= @=T_called_thread_eosubrule* eos = (T_called_thread_eosubrule*)sym;@>@/ @*2 |crt_start_rule_s_follow_set|.\fbreak What is the follow set for the Start rule and should there be one anyway? There is only the empty string so what's it follow string? Empty? Here's the scoop. A monolithic grammar force an ``eolr'' terminal. This covers the end-of-the grammar process and homogenizes it in the same manner as threaded grammars. For the thread grammar use its calculated ``parallel-la-boundary'' expression. This is its follow set. The second question raised allows the Start rule's productions to reduce as accepted. @+= void state::crt_start_rule_s_follow_set(AST& Start_rule){ CAbs_lr1_sym* sym = AST::content(Start_rule); @=rule_def* rd = (rule_def*)sym;@>; RULE_ENO rno = rd->enum_id(); follow_element* fe = new follow_element(this); fe->rule_no_ = rno; state_s_follow_set_map_[rno] = fe; fe->rule_def_t_ = rd->rule_s_tree(); if(O2_PP_PHASE == 0){// monolithic: use eolr not eog fe->follow_set_.insert(STBL_T_ITEMS[LR1_EOLR]); }else{// thread: use calculate lookahead expression T_parallel_parser_phrase* pp_ph = O2_PP_PHASE; T_parallel_la_boundary* la = pp_ph->la_bndry(); LA_SET_ITER_type i = la->la_first_set()->begin(); LA_SET_ITER_type ie = la->la_first_set()->end(); for(;i!=ie;++i){ T_in_stbl* t = *i; FOLLOW_SETS_ITER_type j = fe->follow_set_.find(t); if(j == fe->follow_set_.end()){ fe->follow_set_.insert(t); } } } } @*1 {\bf{For tracing facilities}}. @*2 |entry_symbol_literal|.\fbreak @+= const char* state::entry_symbol_literal(){ if(vectored_into_by_elem_sym_ == 0) return " No symbol"; @; } @*3 Return entry symbol literal.\fbreak Eases tracing of lr states easier instead of just displaying its enumerated value. @= Voc_ENO id = vectored_into_by_elem_sym_->enumerated_id__; CAbs_lr1_sym* sym = vectored_into_by_elem_sym_; switch(id){ case T_Enum::T_rule_def_:{ @=rule_def* rd = (rule_def*)sym;@>@/ return rd->rule_name()->c_str(); } case T_Enum::T_T_eosubrule_:{ return "eos"; break; } case T_Enum::T_T_null_call_thread_eosubrule_:{ return "null-eos"; break; } case T_Enum::T_T_called_thread_eosubrule_:{ return "called-thd-eos"; break; } case T_Enum::T_T_terminal_def_:{ @=T_terminal_def* td = (T_terminal_def*)sym;@>@/ return td->t_name()->c_str(); } default:{// cuz apple's latest compiler: symantic analysis error-never reached return "null-eos"; break; } } @** Emit FSA, FSC, and Documents of grammar.\fbreak Finally. This is a hodge/podge of routines to emit the ``cpp'' code, ``first set control'' file for \o2{}linker, and various documents to be run thru the ``mpost, cweave, and pdftex' stages. A caution, under my Sun Solaris system, the ``Adobe 3rd party Readers'' like ``gpdf and xpdf'' occassionally throw tantrums. I'll look into these after finishing \O2. @*3 Output enumeration header file. @= lrclog << "Output enumeration header file " << endl; OP_ENUMERATION_HEADER(Error_queue); @; @*3 Output Errors file. @= if(ERR_SW == 'y'){ lrclog << "Output Errors vocabulary files " << endl; OP_ERRORS_HEADER(Error_queue); OP_ERRORS_CPP(Error_queue); @; } @*3 Output User Terminals files. @= if(T_SW == 'y'){ lrclog << "Output User Terminal vocabulary files " << endl; OP_USER_T_HEADER(Error_queue); OP_USER_T_CPP(Error_queue); @; } @*3 Output grammar header file. @= lrclog << "Output grammar header file " << endl; OP_GRAMMAR_HEADER(Error_queue); @; @*3 Output grammar cpp file. @= lrclog << "Output grammar cpp file " << endl; OP_GRAMMAR_CPP(Error_queue); @; @*3 Output grammar sym file. @= lrclog << "Output grammar sym file " << endl; OP_GRAMMAR_SYM(Error_queue); @; @*3 Output grammar tbl file. @= lrclog << "Output grammar tbl file " << endl; OP_GRAMMAR_TBL(Error_queue); @; @*3 Output T-alphabet file. @= if(T_SW == 'y' || ERR_SW == 'y'){ lrclog << "Output User Terminal vocabulary files " << endl; OP_T_Alphabet(Error_queue); @; } @*3 Documents --- Grammar's Cweb and Mpost diagrams, and Cross references.\fbreak @= if(PRT_SW == 'y'){ @ @ } @*4 Emit Cweb and Mpost diagrams.\fbreak This is my attempt at producing grammar reports by |mpost| diagrams and |cweave|. Of course, this is recursive in that cweb uses |cweave| to generate code with a simple attempt at grammar's railroad diagrams. To do this, a new comment type was created --- ``/@@'' -- ``@@/'' to include the |cweb| directives declaring how the document's formatting will look like with its table of contents. The comment is a play on |cweave|'s directives. They can be sprinkled thru out the grammar except within the syntax-directed code constructs as u know, this is a character at a time lex crawler with little knowledge of c$++$ syntax apart from comments, literals, and strings. Once u become familiar with |cweave|, u'll never go back to base comments --- aka c$++$. The files emitted take on the file naming format of:\fbreak \ptindent{1) grammar name without its extension ``.lex'' for the filename body} \ptindent{2) ``.mp'' mpost extension} \ptindent{3) ``.w''cweb extension} As an example, a grammar of ``eol.lex'' would produce the |mpost| file 'eol.mp' and |cweb| file `eol.w'. These files are run thru |mpost| and |cweave| followed by |pdftex| to produce the 'pdf' grammar document. |mpost| generates from 'eol.mp' its figure files with numeric extensions like 'eol.1' that are referenced in the |cweb| file `eol.w' using the |convertMPtoPDF| macro. @= using namespace yacco2; using namespace NS_yacco2_k_symbols;@/ using namespace NS_mpost_output;@/ lrclog << "Emit grammar's railroad diagrams for Mpost" << endl;@/ set cweb_k_filter;@/ cweb_k_filter.insert(T_Enum::T_T_cweb_comment_); tok_can_ast_functor mpost_just_walk_functr; ast_prefix mpost_rule_walk(*GRAMMAR_TREE,&mpost_just_walk_functr,&cweb_k_filter,BYPASS_FILTER); tok_can mpost_rules_can(mpost_rule_walk); Cmpost_output mpost_fsm; T_fsm_phrase* fsm_ph = O2_FSM_PHASE; mpost_fsm.grammar_filename_prefix_ += fsm_ph->filename_id()->identifier()->c_str(); mpost_fsm.fq_filename_noext_ += o2_fq_fn_noext.c_str(); Parser mpost_rules(mpost_fsm,&mpost_rules_can,0,0,&Error_queue); mpost_rules.parse(); @*4 Print out xref docs.\fbreak \ptindent{1) cross reference of used symbols: grammar's vocabulary} \ptindent{2) grammar's rules first sets} @= using namespace NS_prt_xrefs_docs; yacco2::lrclog << "----- Print xref docs -----" << std::endl; @=set prt_xrefs_docs_filter;@>@/ prt_xrefs_docs_filter.insert(T_Enum::T_rule_def_); prt_xrefs_docs_filter.insert(T_Enum::T_T_subrule_def_); prt_xrefs_docs_filter.insert(T_Enum::T_refered_T_); prt_xrefs_docs_filter.insert(T_Enum::T_refered_rule_); prt_xrefs_docs_filter.insert(T_Enum::T_T_called_thread_eosubrule_); prt_xrefs_docs_filter.insert(T_Enum::T_T_null_call_thread_eosubrule_); prt_xrefs_docs_filter.insert(T_Enum::T_T_eosubrule_); tok_can_ast_functor xrefs_docs_walk_functr; ast_prefix prt_xrefs_docs_walk(*rules_tree ,&xrefs_docs_walk_functr,&prt_xrefs_docs_filter,ACCEPT_FILTER);@/ tok_can prt_xrefs_docs_can(prt_xrefs_docs_walk); Cprt_xrefs_docs prt_xrefs_docs_fsm; prt_xrefs_docs_fsm.grammar_filename_prefix_ += mpost_fsm.grammar_filename_prefix_; prt_xrefs_docs_fsm.fq_filename_noext_ += o2_fq_fn_noext.c_str(); Parser prt_xrefs_docs(prt_xrefs_docs_fsm,&prt_xrefs_docs_can,0,0,&Error_queue); prt_xrefs_docs.parse(); @*3 Emit FSC file.\fbreak Let the linker know what's happening with the grammar's first set for the linker. @= OP_FSC_FILE(Error_queue); @*2 Driver to emit FSA, FSC, and Documents of grammar. @= //|@;| @; @; @; @; @; @; @; @; @; @; @** Print routines. @*2 Print tree structure of rules. @= yacco2::lrclog << "Tree dump of grammar" << std::endl; prt_ast_functor prt_functr(&PRINT_RULES_TREE_STRUCTURE); ast_prefix pre(*GRAMMAR_TREE,&prt_functr); while (pre.base_stk_.cur_stk_rec() != 0){ pre.exec(); } @*2 Print grammar tree. @= yacco2::lrclog << "Grammar Tree dump" << std::endl; prt_ast_functor prt_grammar_functr(&PRINT_GRAMMAR_TREE); ast_prefix prefix_grammar(*GRAMMAR_TREE,&prt_grammar_functr); tok_can pt_can(prefix_grammar); int n(-1); for(;pt_can[++n] != NS_yacco2_k_symbols::PTR_LR1_eog__;); @*2 Print dump common states. @= LR1_STATES_ITER_type ci = LR1_COMMON_STATES.begin(); LR1_STATES_ITER_type cie = LR1_COMMON_STATES.end(); yacco2::lrclog << "Common States dump" << std::endl; int cstate_no(0); for(;ci!=cie;++ci){ ++cstate_no; STATES_ITER_type si = ci->second.begin(); STATES_ITER_type sie = ci->second.end(); bool pre = false; for(;si!=sie;++si){ state* cstate = *si; if(pre == false){ pre = true; yacco2::lrclog << cstate_no << "::Common State: " << cstate->vectored_into_by_elem_; if(cstate->vectored_into_by_elem_sym_ == 0){ yacco2::lrclog << " Entry Symbol: No symbol" << std::endl; }else{ yacco2::lrclog << " Entry Symbol: " << cstate->entry_symbol_literal() << std::endl; } } yacco2::lrclog << "\tstate no: " << cstate->state_no_ << std::endl; } yacco2::lrclog << std::endl; } @*2 Print dump state. @= const char* literal_name; yacco2::lrclog << std::endl; yacco2::lrclog << "State dump" << std::endl; si = LR1_STATES.begin(); sie = LR1_STATES.end(); for(;si!=sie;++si){ state* cur_state = *si; Print_dump_state(cur_state); } @*2 Some tracing facilities. @*3 Increment and printout Recursion counter. @= @; @; @*3 Increment Recursion counter. @= ++RECURSION_INDEX__; @*3 Printout Recursion counter. @= for(int y__(1);y__<=RECURSION_INDEX__;++y__) lrclog << '.'; @*3 Decrement Recursion counter. @= --RECURSION_INDEX__; @** Write out |o2_defs.cpp| Structure implementations.\fbreak @(o2_defs.cpp@>= #include "o2.h" @;