1 Introduction
In 1968, Lindenmayer lindenmayer_lsystems proposed a grammar system later called Lindenmayer systems (Lsystems) to model multicellular growth in plants. They consist of rewriting rules that are used to replace, in parallel, every symbol in a string with a word. The process of replacing all symbols with words in this manner is called a derivation step. Starting from an initial word, derivation steps iterate, thereby producing a sequence of strings. The strings can produce a visual simulation or model by interpreting the symbols as instructions, such as “draw a line”, or “turn left/right” (e.g., drp_modeling_by_hand ; beauty ; galarreta_s0lbloodvessel ; rongier_sedimentary ) with each string being one step of the temporal simulation. Lsystems can be deterministic, implying that the simulation always proceeds the same way, with each string uniquely determined by the previous string; or stochastic, where the rules and derivations have a probability of occurring.
Lsystems of different types have been particularly successful at modelling plant growth beauty ; ben_naoum_surveryLsystems . Stochastic models have successfully modeled different plant structures and processes beauty ; agu1985stochastic , including Japanese Cypress trees nishida_k0l . As will be discussed, the algorithm presented in this paper is domain agnostic, and so may have applications in other diverse research domains where stochastic Lsystems have been successfully used. These other domains include modelling protein secondary structure danks_s0lproteinfolding , arterial branching galarreta_s0lbloodvessel and sedimentary formations rongier_sedimentary . The process for finding a stochastic Lsystem has been described by Galarreta et al. galarreta_s0lbloodvessel as requiring “tedious and intricate handwork” that could be improved by an algorithm to “infer rules and parameters automatically from real …images”.
Currently, models using S0Lsystems are generated predominantly by hand by experts (e.g., beauty ; galarreta_s0lbloodvessel ; rongier_sedimentary ; nishida_k0l ), which is time consuming. In danks_s0lproteinfolding , automatic inference of S0Lsystems was discussed specifically for the application of modeling protein folding. Their approach is domain specific as they use a priori scientific knowledge regarding proteins as the basis for constructing the Lsystem’s rules. Additionally, often the resulting models from existing approaches are assessed aesthetically danks_s0lproteinfolding ; rongier_sedimentary ; nishida_k0l , which further reinforces that such approaches are domain specific as an aesthetic assessment can rarely (if ever) be transferred to another process. These two drawbacks hinder automatic inference.
One approach for inferring an Lsystem from one or more temporal sequences of images is to divide the problem into two steps: 1) a segmentation of each image into the letters such that they would approximately draw the image in a simulator, 2) the inference of an Lsystem from the sequences of strings. This work is concerned with the second of these steps, which is called the inductive inference of an Lsystem. This paper presents a generalized algorithm for inductively inferring S0Lsystems from one or more sequences of strings, called the Plant Model Inference Tool for Stochastic Contextfree Lsystems (PMITS0L), which despite the name is domain independent. PMITS0L could possibly reduce the time and effort required to produce a model of a process from imagery. Furthermore, since PMITS0L uses no a priori information, it can also help reveal the mechanisms underlying a process (perhaps hidden from the images themselves), and thereby have additional scientific impact.
Inductive inference was defined in the 1970’s, and studied briefly in the theoretical computer science community doucet_algebra ; herman . It has recently been studied for deterministic contextfree Lsystems bernard_pmitml ; mcquillan_poly where it was found that all systems in a test suite of Lsystems found in the literature could be inferred accurately, each in under seconds. Inference of Lsystems generally was surveyed in ben_naoum_surveryLsystems .
One of the challenges with S0Lsystem inference, in comparison to deterministic Lsystems, is the multitude of systems that may produce the sequences of strings. For PMITS0L, it is argued within that given a set of Lsystems that can produce the string sequences, the best choice (in absence of any additional information) is an S0Lsystem that has the greatest probability of having produced the input strings. As such, the core concept for PMITS0L’s algorithm is a greedy selection process that chooses rewriting rules such that after each choice, the result would be the S0Lsystem that locally maximizes the probability of producing the input strings.
In this paper, the implementation of PMITS0L is described. It uses a greedy algorithm hybridized with a search algorithm (exhaustive search and genetic algorithm are evaluated as the search algorithms). Then, an evaluation is presented for PMITS0L at inferring S0Lsystems using
procedurally generated S0Lsystems. These systems were generated using statistical properties of existing Lsystems in order to be equally as complex as Lsystems created by experts. Procedurally generated Lsystems were used since only one S0Lsystem could be found explicitly in the literature nishida_k0l (other papers danks_s0lproteinfolding ; galarreta_s0lbloodvessel ; rongier_sedimentary created them but do not include it in their respective paper). The limited number of stochastic Lsystems in the literature is due to the difficulty in constructing them (nishida_k0l took an entire lengthy paper to create and justify theirs as will be described in Section 2), which would be dramatically improved by an automated approach. PMITS0L is also evaluated on this published S0Lsystem for Japanese Cypress nishida_k0l . Additionally, the effects of having sequences of strings as input for different values of is investigated with respect to differences between the true S0Lsystem and the S0Lsystem provided as a solution by PMITS0L.Scenarios where more than one sequence are used are important as, for example, it would be possible to have imagery from many plants of the same genotype of a species (or different genotypes of a species), and to desire a stochastic model to describe the population of plants. Furthermore, it would be particularly useful to compare populations of plants by comparing the models. Additionally, stochastic Lsystem models that can describe a diversity of plants are useful indirectly to improve image recognition tasks. For example, Ubbens et al. drp_rosette use a large set of synthetic Lsystem images of Arabidopsis thaliana
rosettes to augment the size of a data set used to train a deep convolutional neural network for the purposes of leaf counting. This was found to improve leaf counting on real rosette images versus only training using real images.
This work is an extension of the conference paper bernard_pmitsol that had the additional restrictions of only using one sequence of strings as input, and it had a limitation with respect to not having two successors of the same symbol with one being a prefix of the other — the socalled prefix limitation. This paper additionally presents methods for removing both limitations. PMITS0L is evaluated using a variety of performance metrics. Detailed in Section 3, the success rate describes the percentage of times that an S0Lsystem is found that has an equal or greater probability of producing the input than the original S0Lsystem has of producing the input. Thus, if the original Lsystem was defined as the correct Lsystem, then it can be possible to find Lsystems that are more likely to have generated the input, and are therefore better than the Lsystem which actually generated the strings. The time taken to find a solution (or report that none exists) is also measured. Additional metrics are used to capture the differences between the predicted solution and the original system, both in terms of rewriting rules and probabilities.
The remainder of this paper is structured as follows. Section 2 provides some background information on Lsystems and some applications of S0Lsystems. Section 3 discusses some of the unique challenges of inferring stochastic Lsystems, and how PMITS0L functions. Section 4 provides the methodology for evaluating PMITS0L, focusing on the methods used to evaluate multiple sequences of strings as input. Section 5 provides the results of the evaluation. Finally, Section 6 concludes the paper and discusses future directions of PMITS0L and inferring Lsystems.
2 Background and Preliminaries
To start, some basic formal notation is required. An alphabet is a finite set of symbols. Given an alphabet , is the set of all words over . For a word , the length of is denoted by . And for a finite set , the number of elements in is denoted by .
Formally, contextfree Lsystems are described by an ordered tuple where is an alphabet, is a finite set of strings where each , is a word in called an axiom (some definitions have only one axiom), and is a finite set of productions (also referred to as rewriting rules, although the term productions will be used herein). Each production is of the form where is called the predecessor and is called the successor of the production (or a successor of ). The system is said to be nonerasing (also called propagating in the Lsystems literature) if is not the empty word for any production. The term contextfree here means that the neighboring symbols in the string do not affect the selection of a successor. A derivation step is defined by, , if , , and , for . The system is called deterministic if only contains one string, and contains exactly one rule with each symbol in ; with deterministic systems, a derivation step on a string involves taking, in parallel, every symbol in the string and replacing it with its unique successor. In a stochastic Lsystem, every has a set of one or more successors each with an associated probability of being selected, such that the sum of the associated probabilities for each equals . When performing a derivation step with a stochastic Lsystem, for each symbol in a string, a successor is chosen from the set of corresponding successors with the associated probability. Formally, a stochastic contextfree, or S0Lsystem, eichorst_stochastic is a quadruple , where is a contextfree Lsystem, is a function from to such that, for all , (that describes the probability of applying a production), and is a function from to such that (that describes the probability of starting with an axiom).
In eichorst_stochastic , the authors continue with the following definitions for derivations of an S0Lsystem. Given as words in where , a derivation of to of length consists of two items:

a trace, which is a sequence of words such that ,

a function from the set of pairs into such that, for from to , if , then where for from to .
Thus, the function describes the productions applied to each letter in the derivation. Given such a derivation , the probability of deriving , is . Further, the probability of occurring is . Lastly given a finite set of derivations , the probability of them occurring is
(1) 
As mentioned in Section 1, modeling with Lsystems is done by associating a meaning to each symbol (or a subset of symbols) in ; typically the meaning is an instruction for simulation software such as the “virtual laboratory” vlab . So, a string of symbols is then taken as a sequence of instructions, and each derived word is taken as the next step in a temporal process. Symbols may have graphical and/or mechanical functions within the resulting model. A common graphical interpretation is the socalled turtle graphics beauty , which imagines a turtle in a 2D or 3D space with a state consisting of a position and orientation. The graphical symbols then modify the turtle’s state. When the turtle moves forward, it may optionally simultaneously draw a line. For branching models, two graphical symbols (“[” and “]”) will push and pop the turtle’s state onto a stack and switch to that state. Other symbols are used to represent components or an underlying mechanism in the model. For example, a symbol may be used to represent the apex of a plant where the stem will continue to grow at the next derivation step, until it flowers; this can be modeled stochastically beauty .
Nishida nishida_k0l investigated using S0Lsystems to model Japanese Cypress trees. He did not use any a priori biological knowledge of Japanese Cypress, and instead used a process of converting imagery to segments, and then to an Lsystem. Importantly, this is the same process as our main goal (images to segments to Lsystems, with PMITS0L being useful for the second step), except their work was done completely manually. In the paper, the meticulous process is described of segmenting images of Japanese Cypress by hand. The segments are then mapped to letters of an alphabet, with successors and associated selection probabilities picked for the segments such that they would reproduce the appropriate segmentation in the next image. The result is an S0Lsystem with symbols and a total of productions (shown in Table SD1 in the supplementary materials). The system produces imagery similar to the photos of the Japanese Cypress, as seen in Figures 0(a) to 0(d) nishida_k0l . This shows that the goal of automatic inference in exactly this fashion is an exciting opportunity, as doing so manually requires a huge amount of effort.
3 Inferring S0LSystems using Greedy Algorithm
This section will describe the methodology of PMITS0L. For the remainder of this paper let be the input, where each is a finite sequence of strings over . Furthermore, let , (each is a string in ). In this paper, we assume (all sequences can be truncated to have the same number of strings), and we denote this size by . Inferring an Lsystem can be stated as outputting some Lsystem (or reporting that none exists) that could produce all the sequences in ; that is, has a derivation with a trace of , for each , . In this case, is said to be compatible with . To do this, the algorithm scans each sequence in and attempts to determine a derivation, which has a trace of in a wordbyword fashion. Each time it determines part of a derivation (e.g., that some specific occurrence of a letter in produces the subword between two positions of ), it adds this production to the current list of productions if it has not already been added.
3.1 Complications with Stochastic Inference
If a compatible Lsystem does exist, then it must be found in the space of all Lsystems, and hence it is at least possible to search for one. PMITD0L bernard_pmitdv3 ; bernard_pmitml is an existing tool for inferring deterministic contextfree Lsystems (D0Lsystems). The first PMITD0L implementation bernard_pmitdv3 used genetic algorithm, and searched for successors as an ordered sequence of symbols in a search space that was pruned using mathematical properties based on necessary conditions of Lsystems. A significant improvement was made when it was recognized that every successor of a symbol must be a subword of every word directly after one where occurs (for a deterministic Lsystem is enough since the word produced at each step is uniquely determined), and that searching for an ordered sequence of symbols could be replaced by searching for successor lengths for each letter of the alphabet bernard_pmitml ; mcquillan_poly . With each possible solution consisting of a successor length for each , it is possible to scan every symbol in each word in starting from the first word, and take the subword of the associated length as the successor. When a scan can be completed without any inconsistencies, then the resulting successors are compatible with . This is referred to as the scanning process. This approach works in the deterministic case because any information deduced about the successor for any instance of in must be true for every that appears in every word in , and therefore this search space has only dimensions.
For stochastic Lsystems, while it still true that every must produce a subword of the next string and hence some kind of encoding scheme with successor lengths is possible, it is no longer true that deducing a fact about some instance of is true for all other instances of . Indeed, with S0Lsystems, different instances of each (of each word of each sequence) in can produce different successors. For , then since every instance of a symbol can produce a different successor, the most intuitive solution space defines at least one dimension (for example, using the successor length encoding scheme from bernard_pmitml ; mcquillan_poly ) for every symbol in each word of each sequence to produce a search space with dimensions, where . This is clearly intractable as an extra dimension is needed for every additional symbol in . Furthermore, reducing the bounds for the dimensions (e.g., upper and lower bounds on successor length) is very difficult since there is little information upon which to deduce such bounds (due to the aforementioned issue that every instance of a symbol can produce a new successor). The lower bounds of each successor length is (productions are assumed to be nonerasing in this paper) and the upper bound for an instance of would be the length of the next word minus the number of symbols in the previous word (since every symbol produces at least one symbol) directly after one where that occurs. In summary, to search for an S0Lsystem with the entire search space in a similar fashion to what has been done for D0Lsystems, requires both a number and scale to the dimensions that would be too large to allow for a practical search time.
A further complication when inferring S0Lsystems is there exist a multitude of possible S0Lsystems that are compatible with . By taking each pair of consecutive words , in every of , then every way of dividing into subwords can be used to create productions that lead to a compatible S0Lsystem. However, one crucial and distinguishing property for any of these candidate solutions is the probability that it produces . A best possible solution compatible with is desired rather than an arbitrary solution.
3.2 Methodology with PMITS0L
PMITS0L infers an S0Lsystem based on by selecting successors, such that each choice locally maximizes the probability of producing the words of scanned so far. This paper investigates inferring an S0Lsystem by scanning the words symbolbysymbol, and choosing each successor based on a successor length (similarly to PMITD0L in bernard_pmitml ; mcquillan_poly ) by preferring successors that have been previously selected using a greedy algorithm.
To determine an S0Lsystem, an alphabet, axioms, and rewriting rules with associated probabilities need to be predicted. The alphabet is found easily by recording every unique symbol in , and so we henceforth assume it is known. For the S0Lsystem, multiple axioms are assumed. In particular, the set of all axioms is assumed to be the set of all of the first words of sequences in . We do not attempt to infer the probability of starting with each axiom (which could just be calculated as the number of times each axiom is the first word of a sequence in divided by ). Also, computing a reduced number or a single axiom is an area for future investigation. The process for inferring the rewriting rules and their probabilities is more involved and is described in remainder of this section.
As mentioned earlier, for any , there are a multitude of possible compatible S0Lsystems, but a best solution is a S0Lsystem with the greatest probability of producing bernard_pmitsol . In absence of any additional information (e.g., a priori knowledge that might lead to a different choice), the S0Lsystem with the greatest computed probability is said to have maximum parsimony. For example, Table 1
shows two abstracted S0Lsystems, where in parentheses is the number of times that each successor was applied to produce a sequence of strings in the two different derivations. The “Odds” column shows the computed probability of the derivation occurring. It can be seen that the first S0Lsystem has a greater probability of producing the sequence of strings, and so should be preferred as the solution. The probability that an S0Lsystems produced
is increased when one or a few successors have a high probability, as opposed to an equal distribution across all of the successors. This mathematical property provides the guidance for a greedy algorithm to infer an S0Lsystem using the process described next.Successors  Odds 

Conceptually, the core greedy algorithm component is relatively straightforward. Suppose that a list of successors for each is being maintained including a count of how often each successor has been selected. For each word with from to , the algorithm will partition into subwords by scanning each letter in order from lefttoright as follows: If is the rightmost symbol of , then pick the successor to produce everything that remains in (the parts of that have not been matched in the derivation). If is not the rightmost symbol, then pick the successor, if one exists, from the current list for out of those that match the next symbols of that has been selected most often so far. One issue with this algorithm is that early on, especially for the first few symbols scanned, the list of successors for each
is (or is near) empty or none match, and so the greedy process is not able to make a choice from the existing list. Sometimes in the literature, this type of problem is resolved using a random forest algorithm
breiman2001random , but this was found to not work well in this instance (discussed below). A search algorithm was used in these instances to pick successors when nothing in the list is matched.For this, the algorithm keeps a vector
of nonnegative integers. When there is no successor in the current list for that matches the next symbols in , the algorithm retrieves the next unused integer from , say, and then a successor is selected using the next symbols from . To find (a single sequence for all letters), a search algorithm is used (this search process will be described in Subsection 4.2). However, this process raises a new question: how many successor length choices are needed; i.e., what should be ?The best value for is the number of distinct successors in the best S0Lsystem, which is difficult to accurately predict in advance. However, if a guess at is made, then either the guessed will be exactly right, too low, or too high. If is exactly right, then there are no issues. If is too high, then so too will be the execution time, i.e., the search space becomes larger than necessary. If is too low, then it is possible that the algorithm may reach a point where the greedy algorithm cannot make an appropriate choice and there is no element left in . In this case, then either the search can be restarted with a higher value of or could be extended by one dimension. However, some solutions will encounter this issue repeatedly, thus requiring many extensions leading to an increased execution time. In practice, simply extending each time as needed is probably intractable; e.g., there would be candidate solutions that would required hundreds or thousands of additional dimensions. A balance can be achieved by having the implementation extend by a limited additional number of times per word of (PMITS0L uses a limit of one), which can help find correct solutions while keeping the expansion of the search space reasonable. If more than the chosen limit is needed, then a higher value of is required.
Lastly, the process will terminate with error when none of the methods above can produce a successor for the current symbol (this only occurred when was too small in our tests). In this case, needs to be incremented; however, it is unclear what is an optimal value for the increment. A small increment for minimizes the growth of the search space, but increases the chance that the search will fail again. A large increment for increases the chance that the subsequent search will succeed, but will possibly make the search space larger than necessary (certainly an increment of will eventually find the correct ). In our experiments, a value of that is too low is simply considered a failed state in order to better understand its effect on the time taken to complete. The value of can be increased manually or automatically if desired.
In summary, for a given integer sequence , the successor selection process works by choosing the first applicable rule when scanning each character in . Let be a programming variable.

If scanning the last symbol of the current word, then select all remaining symbols in the produced word,

An existing successor in the list for matches the next symbols in ; the most frequently chosen thus far is selected greedily,

Build a successor of length , and increase by one,

Terminate with error.
Given an appropriately chosen , this process can determine a compatible S0Lsystem. Furthermore, notice, that this procedure is determining the derivations associated with traces in . However, the solution need not be optimal (the greedy choice might not lead to the best solution).
3.3 Searching for Successor Lengths
Next, the method for determining the integer sequences is described. Two algorithms, standard genetic algorithm (SGA) and exhaustive search (ES), are evaluated. For both, a literal encoding back_geneticalgorithm is used, so a search space is constructed of integer dimensions with bounds from to (as discussed and justified later in Section 4.2, the maximum number of symbols in each successor is assumed to be although this could be increased). For each sequence searched, Procedure 3.2 is used to predict an S0Lsystem compatible with , and the derivations corresponding to each trace in . The fitness value of is defined to be the probability of these derivations occurring with the traces in , which is given in Equation (1). Algorithm 1 shows the pseudocode for PMITS0L.
When the search technique is set to SGA, it uses roulette wheel selection, uniform crossover, uniform mutation, and elite survival operators back_geneticalgorithm . An SGA with simple operators is selected as little is known about the search space and an SGA provides easily tunable mechanisms for balancing exploration and exploitation. Briefly, an SGA works by the following steps back_geneticalgorithm . It consists of a population of genomes, where each genome in this case is a candidate vector consisting of genes. The initial population is produced randomly. The main processing loop of an SGA performs four steps: selection, crossover, mutation, and survival. In the selection step, pairs of genomes are selected from the population. The crossover step takes each pair and randomly swaps zero or more genes between them, and since there are pairs, this results in new genomes. Each new genome is then mutated by randomly changing zero or more genes to a new value. The survival step merges the initial and new populations. Each member of the population is assessed a fitness value as described above (using Procedure 3.2 with from the population), and the population is sorted by fitness value. The top genomes are preserved and the remainder are culled. With respect to Algorithm 1, for an SGA corresponds to performing the selection, crossover, mutation, and survival steps. is the entire population of genomes. The SGA terminates by convergence detection ( in Algorithm 1), which functions as follows. Termination occurs as follows: every time a new best solution is found, a variable records the number of the current generation. If additional iterations occur without a new best solution being found, then the SGA terminates.
To optimize the control parameters of the SGA, a hyperparameter search was done
hyperparameter using a random search with trials. As a result of the hyperparameter search, the control parameters were set as follows: population size of , crossover weight of , and a mutation weight of .In contrast, the ES simply iterates through all possibilities until it terminates, keeping track of the S0Lsystem with the highest fitness value. Since later dimensions are often unused, it would be preferred for these dimensions to be searched last, i.e., to search deeply into the leading dimensions, hence a depth first search is used. corresponds to searching one step deeper. is the number of candidates solutions in this iteration, which is always with ES. returns when there are no more nodes to search.
3.4 The Prefix Limitation
In the conference paper bernard_pmitsol , this earlier version of PMITS0L had an identified limitation, when for at least one , there are two or more successors, and one successor is a prefix of the other bernard_pmitsol . Let be two successors of , such that , and are words over . In this case, if the shorter successor () is encountered first in , then assuming all other successors are correct, for all future instances of , the greedy choice (rule of Procedure 3.2) will always select as the successor as the next symbols can be positively associated to . This effect is shown in Example 3.4 below. Since the introduction of PMITS0L bernard_pmitsol , a technique has been found to often correct this limitation, described next, although it is only used with ES and not with the SGA (due to complications to be discussed).
Let and . Suppose that the successors for in the original system are (with some associated probabilities that are not important for the example). Finally, assume (the length of ), and the search algorithm has a candidate solution with the value . The first will be assigned the successor based on the value found by the search algorithm (rule ). The second will be also assigned based on the greedy choice (rule ), but this decision is incorrect as the desired choice is . Finally, the third will also have the wrong successor of (chosen by rule ).
In this example, in order to find the correct successor, the greedy choice (rule #2) needs to be ignored and instead to use the value in the search space () should be used to find the successor . Instead, is modified to be of the form . Procedure 3.2 is modified so that after using for the length in rule , it only allows at most greedy choices before forcing it to not apply any more greedy choices (skip rule 2 and apply rule 3). Consider the following example:
Suppose the first few values of are . Then, the first five successors have been found by:

Build a successor of length since

A greedy choice

A greedy choice

A greedy choice

Stop allowing greedy choices since

Build a successor of length since )

A greedy choice

Stop allowing greedy choices since

Build a successor of length since
In this way, the search procedure used to produce also dictates exactly when new productions should be created according to , even if a greedy choice can be applied. Furthermore, the modified search space can be pruned if the end of is reached and not all elements in have been used up to (other vectors with different values in the unused parts do not need to be considered). Similarly, if the values leads to an incompatibility, then certain vectors can be pruned. Since it is easier to prune these values with ES, adjusting PMITS0L to remove the prefix limitation requires the use of ES.
Algorithmically, PMITS0L has a Boolean control parameter called prefix limitation (henceforth, ) that controls whether it uses this alternate procedure to address the limitation. Where relevant for discussing differences (e.g., Results) PMITS0L+PL indicates that the prefix limitation variable it set to true, while PMITS0LPL indicates that it is set to false.
It is acknowledged that while this technique can address the prefix limitation, it is somewhat inefficient (shown in Section 5). Finding a more efficient technique to address this limitation is an area for future investigation.
4 Evaluation Methodology
This section describes the experimentation used to evaluate PMITS0L at successfully inferring (parsimonious) compatible S0Lsystems for one or more sequences of strings. This section starts by discussing the metrics used to measure the success of PMITS0L. This is followed by a description of the procedural generation process used to produce the test set.
4.1 Performance Metrics
The metrics used to evaluate PMITS0L are described next. In these metrics, the original system is the hidden Lsystem that generated the input strings, and the candidate is the predicted Lsystem. While the original Lsystem can be thought of as the ground truth, it is actually possible to find an Lsystem that is more likely to generate the input than the original Lsystem. Therefore, the main goal is to find one with the highest probability of generating rather than the original. This can occur when the original Lsystem produces by chance that is unlikely for it. As the number of string sequences in increases, the less likely this should be. Hence, the main research question of this work is to investigate the effect of inferring a candidate Lsystem when multiple sequences of strings are used as input.
Several measures are used to assess accuracy, which is necessary in order to properly capture different ways in which S0Lsystems can differ: success rate (SR), mean time to solve (MTTS), weighted true positive  system to candidate (WTPH2C), weighted true positive  candidate to system (WTPC2S), probability error (), maximum successor difference (diffmax), and successor difference rate (diffrate), which will be described next.
Success for PMITS0L is defined as inferring an S0Lsystem that is compatible with
that has equal to or greater probability of producing the sequence(s) than the original system (finding an Lsystem that is slightly worse than the original would therefore be classified as not successful with this stringent measure). Thus, success rate is the percentage of experiments for a set of control parameters that successfully finds such an S0Lsystem.
MTTS is the amount of time required by PMITS0L to find a candidate system, whether successful or not. Execution time is limited to hours to keep overall experimental times practical (in practice, a user may be willing to wait longer for a result). All timings were captured using a single core of an Intel 4770 @ 3.4 GHz with 12 GB of RAM on Windows 10.
Within the context of this work, a true positive is defined as a successor that is in both the original Lsystem and the candidate Lsystem regardless of any difference in their associated probability. One could also certainly define the following: a false positive as a successor that is in the candidate Lsystem but not in the original, and a false negative is the opposite. It would be possible to simply count the true positives, false positives and false negatives with these definitions. However, there are two issues: first, the terms false positive and false negative implies that original Lsystem is the better Lsystem, which might not be the case. Second, only counting successors ignores the probabilities and successors with a higher associated probability are more important toward reproducing a process in silico. Hence, the measures listed next are used instead, and they are weighted to favor those with a higher associated probability.

Weighted True Positive  System to Candidate (WTPS2C) is the sum of associated probabilities for true positive successors in the original system divided by .

Weighted True Positive  Candidate to System (WTPC2S) is the sum of associated probabilities for true positive successors in the candidate system divided by .
Ideally, the weighted true positive values above should be 1.00 indicating a perfect match. When a match between the hidden and candidate Lsystems is imperfect, greater values generally indicate that successors with higher associated probabilities have been matched.
Probability error () is calculated by taking each production in the original system; if that production is also in the predicted system, then add the absolute difference of the two probabilities; if the production is not present, then add the probability of it occurring (in the original system). Once all productions have been examined, the total is divided by the number of symbols in the alphabet. Thus, this metric is measuring how different the predicted Lsystem is from the original even taking into account the differences in probabilities of the rewriting rules. It is an important metric for the accuracy of inferring S0Lsystems.
While it is argued that successors with a higher associated probability are more important towards successfully simulating a process, the measures maximum successor difference (diffmax), and successor difference rate (diffrate) provide an alternative viewpoint to the weighted metrics above on how well the candidate and hidden system match. The maximum successor difference is the largest count of the number of successors that are in the hidden Lsystem but are not in the candidate Lsystem across the experiments. While this could be averaged over the number of productions, the intent of this measure is to examine if the number of extra or missing successors varies with (as opposed to by number of productions and ). The successor difference rate is the percentage of experiments where the successor difference was not zero.
4.2 Data
Since there are very few published S0Lsystems in the literature, the test cases for PMITS0L are mainly procedurally generated described as follows.
The procedural generation is based on observations of Lsystems found in the University of Calgary’s virtual laboratory algorithmicbotany . The observations focused on alphabet size, successor rule length, the number of successors per symbol, and the number of distinct letters in successors. Based on the observations, the alphabet size is between and symbols, the successor length may vary from to , each symbol may have between and successors, and each successor has to distinct symbols. Graphical symbols (that are part of the turtle interpretation) are not used explicitly. Therefore, our experimental results are reflective of the symbols being totally arbitrary without any assumptions on the alphabet.
Alphabet size is not explicitly picked, but instead is based on a value that is the total number of successors. Symbols are then iteratively added from () with a randomly selected number of successors for each symbol added until is reached. Each symbol has a chance of having successor, a chance to have successors, and a chance to have successors. If a number of successors for a symbol is selected such that would be exceeded, then it is reduced to ensure that this does not happen. It is possible by chance that all symbols would have successor, which would be a deterministic Lsystem. If this occurs, then two symbols from are picked randomly; e.g. and . The symbol is given another successor, and is removed ensuring the total number of successors remains correct, and that all produced systems are stochastic, but not deterministic.
When a symbol has or more successors, the associated probability () for each successor is found by iterating over the number of successors, and selecting a random value between an upper and lower bound , which are programming variables. The upper and lower bounds are initialized such that and . After each iteration, . The last successor has . The next step is to construct a word for each successor over .
To determine the length of the successor, the following probability distribution is used to determine each successor’s length (expressed as
 , where is the chance the successor has length ):  1,  2,  3,  4,  5,  6,  7,  1,  1, and  10. Finally, the probability distribution for the number of distinct symbols is evenly divided from to ( chance each). Random symbols are then selected from until the successor length is reached.Using the procedure above, three data sets are generated to evaluate PMITS0L.
The first and second data set are called (data set prefix limitation) and (data set no prefix limitation) respectively and both enforce that to isolate any effects from higher values of . furthermore enforces that all produced systems have no cases where and for any , where and are words over ; i.e., no common prefix for two successors of a symbol . has no restriction on prefixes. The intent of these data sets is to allow an evaluation of the effect of addressing the prefix limitation, so when PMITS0L is executed on the vector produced contains only successor length values. For , was iterated from to , Lsystems were generated for each value of , and each experiment was conducted twice. For , was iterated from to (only because it was clear that PMITS0L would timeout with ), Lsystems were generated for each value of , and each experiment was conducted twice.
To test the effect of using different numbers of sequences of strings , an S0Lsystem is generated using the process described above with the following parameters. (total number of successors) is selected as a random value from to . The lower bound of is the lowest possible value for and S0Lsystem with , and is not considered as it is uninteresting. The upper bound was determined by the experiments with described above. For each generated S0Lsystem, it generates input sequences for each iterated from to . For any single experiment, when , the exact same sequence of strings is not permitted to be generated; if this occurs, a new sequence of strings is generated until it differs from all of the sequences produced so far. Sixty S0Lsystems were generated in this fashion, and for each combination of and , the experiment was done twice (for a total of experiments). The dataset for these experiments is denoted as (data set varying ).
5 Results and Discussion
This section provides the results of the evaluation of PMITS0L. First, the evaluation of PMITS0L on the procedurally generated S0Lsystems (the main result) is discussed, including the effects of addressing the socalled prefix limitation (previously described in Section 3.4). Afterwards, observations regarding the inference of the Lsystem for Japanese Cypress trees are described. The variations of PMITS0L executing with and without the prefix limitation process are denoted as PMITS0L+PL and PMITS0LPL respectively when needed for clarity.
A preliminary investigation was done using only greedy algorithm, then random forest, before building a hybrid greedy algorithm with a simple genetic algorithm (SGA), and exhaustive search (ES). As they were preliminary, the results of these experiments can be found in Table SD2 of the supplementary information; however, in summary both the greedyonly algorithm and the random forest were found to be ineffective relative to the hybrid algorithm. Therefore, the remainder of this section focuses on the hybrid algorithms.
It was found that PMITS0LPL is able to reliably infer S0Lsystems with up to successors when (shown in Figure 2). However, some minor variations were noticed between the candidate solutions and the original S0Lsystems shown on the two WTP lines. While SGA was not as successful as exhaustive search, it is much faster, peaking at about minutes for successors (see Table SD3, time for ES described below). A fitness landscape analysis was completed; however, there were no characteristics to the search space that suggested a particular way forward. Thus, an avenue for future investigation is to investigate the search space more deeply to find ways to avoid the use of ES for reasons of performance.
The effect on MTTS on inferring S0Lsystems with and without the prefix limitation process enabled is shown Figure 3 (the other metrics were not significantly effected so these are included in the Table SD4). It shows that PMITS0L+PL is slower than the other variants. It is difficult to objectively say whether the time increase is worth the ability to infer S0Lsystems where a symbol has a successor that is a prefix of another, as it is unclear how often this occurs in practice. Subjectively, considering the possibility that the encoding scheme described herein to address the prefix limitation had the potential to be extremely large, the time increase seems reasonable. This suggests that, at least for the procedurally generated data sets, the method for removing the prefix limitation with pruning is effective.
The main research goal of this paper is investigating the effect of inferring an S0Lsystem when using various numbers of sequences of strings as an input. Figure 4 shows how the probability error, WTPC2S and WTPS2C change as increases. This raw data is shown in Table 2, which also shows the mean time to solve, the maximum successor difference, and successor difference rate. SR is not shown as it was always , for each value of with PMITS0L+PL and dataset . It can been seen that from , PMITS0L+PL infers the original system for all test cases, as the WTP values are all . With respect to error in the probability distribution, this becomes subjectively reasonable at with average error. Error seems to be close to minimal at (at approximately ), although it continues to decline at a small rate as increases. Increasing has a negligible effect on MTTS since to are only scanned when an S0Lsystem has been found to be compatible with , and this is similar to the scanning process which takes submillisecond time in all tests cases. So, adding additional sequences of strings is generally beneficial in practice.
With respect to differences between the candidate system and the original system, it can be seen that when there can be up to two successor differences. These nonmatching productions tend to be for low probability successors as can be seen from the values for WTP. When , there was a single instance where PMITS0L+PL did not exactly match the original system. Again, the difference was a very low probability successor. In examining this experiment more closely, it was found that the successor in the original system had only been selected once and the symbols lined up in such a fashion that the candidate system found by PMITS0L+PL was reasonable given the data. Still, overall, in practice it would be recommended to have at least sequences of strings to infer an S0Lsystem that closely models an underlying process reliably.
WTPS2C  WTPC2S  diffmax  diffrate  MTTS  

1  
2  
3  
4  
5  
6  
7  
8  
9  
10 
With respect to inferring the S0Lsystem found by Nishida nishida_k0l for modeling Japanese Cypress, PMITS0L+PL was not able to infer it in a practical amount of time. The first experiment was to set , which is much greater than the approximate maximum of
successors in the other experiments. After several hours, this was terminated and it was estimated that PMITS0L+PL would take at least
hours to complete using exhaustive search in a sequential fashion.6 Conclusions and Future Directions
This paper presents an investigation into inferring stochastic contextfree Lsystems (S0Lsystems) when using different numbers of sequences of strings with the Plant Model Inference Tool for S0Lsystems (PMITS0L). PMITS0L is a generalized algorithm for inferring S0Lsystem and requires no a priori scientific knowledge when compared to existing approaches for inferring S0Lsystems algorithmically (e.g. danks_s0lproteinfolding ; beauty ; rongier_sedimentary ). Being generalized means that PMITS0L may be used for any problem as opposed to requiring a specific algorithm for each individual problem in a specific research domain. PMITS0L opens up the possibility of inferring S0Lsystems in research domains where none have been found to date.
PMITS0L is primarily evaluated on procedurally generated S0Lsystems due a shortage of specific systems published in the literature. An analysis was done of existing Lsystems to create realistic procedural generation rules. Three data sets, for a total of Lsystems (a total of experiments were conducted across these systems), were generated to evaluate different aspects of PMITS0L.
PMITS0L has two different modes of operation controlled by a prefix limitation Boolean parameter (denoted as PMITS0L+PL and PMITS0LPL). The prefix limitation parameter controls if it searches all Lsystems, or only those without multiple successors of the same letter, where one is a prefix of the other. The results show PMITS0LPL is quite a bit faster than PMITS0L+PL. For an S0Lsystem with successors, PMITS0LPL will succeed on average in about minutes, while PMITS0L+PL takes several hours. However, PMITS0L+PL is still practical considering the effort required to infer an S0Lsystem by hand.
PMITS0LPL was found be an extremely accurate tool for inferring a compatible S0Lsystem that is at least as probable as the original system. This paper shows that PMITS0L+PL was always able to find the original system when at least three sequences of strings were used as inputs. Additionally, the evaluation showed that when six sequences were used, the error in the solution’s probability distribution compared to the original system becomes low (about or less), and there is not much improvement for adding additional sequences of strings. Therefore, it is recommended that in practice at least sequences of strings should be used to infer S0Lsystems, but or more is ideal to minimize the error in probabilities. Finally, there is essentially no penalty to execution time for adding additional sequences of strings, so there is little reason to avoid using all of the string sequences that are available.
PMITS0L (in either mode) can be used with an exhaustive search, which is not an efficient searching algorithm. However, with normal branchandbound pruning, it could infer all Lsystems in a test suite of Lsystems with at most productions in about hours. Genetic algorithm was also evaluated, and was much faster; however, it was less accurate.
There are several directions that can be taken with this research in the future. Perhaps most importantly comes from applying this research to the practical problem of inferring Lsystems from segmented images. De La Higuera de2005bibliographical argues that to be realistic, Lsystem inference algorithms should handle errors in the strings; e.g., insertions or deletions of symbols. These errors can be, at least initially, considered stochastic productions; i.e., if has the production , but periodically is subject to a deletion error such that , then this can be thought of as a stochastic production. With a set of stochastic productions inferred, it is then a matter of detecting and fixing any errors.
Additionally, parallelism will be used to speed up PMITS0L; however, it still would be useful to investigate more efficient searching techniques to allow PMITS0L to infer S0Lsystems with larger number of productions at practical speeds. Additionally, the main issue with using PMITS0L in a practical fashion is the need to select a reasonable value for the size of the vector used for searching (which roughly corresponds to the number of productions). An algorithm that does not require this parameter would be ideal, or alternatively finding a good way to compute or estimate the vector size.
7 Acknowledgments
This research was undertaken thanks in part to funding from the Canada First Research Excellence Fund, National Science Engineering Research Council grant #201606172, and the Alexander Graham Bell scholarship for Jason Bernard. Additionally, the authors would like to thank Dr. Farhad Maleki for his assistance in acquiring the images for this paper.
References
 (1) A. Lindenmayer, Mathematical models for cellular interaction in development, parts I and II, Journal of Theoretical Biology 18 (3) (1968) 280–315.
 (2) P. Prusinkiewicz, L. Mündermann, R. Karwowski, B. Lane, The use of positional information in the modeling of plants, in: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, ACM, 2001, pp. 289–300.
 (3) P. Prusinkiewicz, A. Lindenmayer, The Algorithimic Beauty of Plants, Springer Verlag, New York, 1990.
 (4) M. A. GalarretaValverde, M. M. Macedo, C. Mekkaoui, M. Jackowski, Threedimensional synthetic blood vessel generation using stochastic Lsystems, in: Medical Imaging: Image Processing, 2013, p. 86691I.
 (5) G. Rongier, P. Collon, P. Renard, Stochastic simulation of channelized sedimentary bodies using a constrained Lsystem, Computers & Geosciences 105 (2017) 158–168.
 (6) F. BenNaoum, A survey on Lsystem inference, INFOCOMP Journal of Computer Science 8 (3) (2009) 29–39.
 (7) M. Agu, Y. Yokoi, A stochastic description of branching structures of trees, Journal of Theoretical Biology 112 (4) (1985) 667–676.
 (8) T. Nishida, K0Lsystem simulating almost but not exactly the same developmentcase of japanese cypress, Memoirs of the Faculty of Science 8 (1) (1980) 97–122.
 (9) G. Danks, S. Stepney, L. Caves, Protein folding with stochastic Lsystems, in: 11th International Conference on the Simulation and Synthesis of Living Systems, 2008, pp. 150–157.
 (10) P. Doucet, The syntactic inference problem for D0Lsequences, L Systems (1974) 146–161.
 (11) H. Feliciangeli, G. T. Herman, Algorithms for producing grammars from sample derivations: a common problem of formal language theory and developmental biology, Journal of Computer and System Sciences 7 (1) (1973) 97–118.
 (12) J. Bernard, I. McQuillan, A fast and reliable hybrid approach for inferring Lsystems, in: Proceedings of the 2018 International Conference on Artificial Life, MIT Press, 2018, pp. 444–451.
 (13) I. McQuillan, J. Bernard, P. Prusinkiewicz, Algorithms for inferring contextsensitive Lsystems, in: 17th International Conference on Unconventional Computation and Natural Computation, Vol. 10867 of Lecture Notes in Computer Science, Springer International Publishing, 2018, pp. 117–130.

(14)
J. Ubbens, M. Cieslak, P. Prusinkiewicz, I. Stavness, The use of plant models in deep learning: an application to leaf counting in rosette plants, Plant Methods 14 (1) (2018) 6.

(15)
J. Bernard, I. McQuillan, Inferring stochastic Lsystems using a hybrid greedy algorithm, in: Proceedings of the 30th International Conference on Tools with Artificial Intelligence, IEEE, 2018, pp. 600–607.
 (16) P. Eichhorst, W. J. Savitch, Growth functions of stochastic Lindenmayer systems, Information and Control 45 (3) (1980) 217–228.
 (17) P. Prusinkiewicz, R. Karwowski, B. Lane, The L+C plant modelling language, FunctionalStructural Plant Modelling in Crop Production 22 (2007) 27–42.
 (18) J. Bernard, I. McQuillan, New techniques for inferring Lsystems using genetic algorithm, in: Proceedings of the 8th International Conference on Bioinspired Optimization Methods and Applications, Vol. 10835 of Lecture Notes in Computer Science, Springer, 2018, pp. 13–25.

(19)
L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.

(20)
T. Back, Evolutionary Algorithms in Theory and Practice: Evolution Strategies, Evolutionary Programming, Genetic Algorithms, Oxford University Press, 1996.
 (21) J. Bergstra, Y. Bengio, Random search for hyperparameter optimization, Journal of Machine Learning Research 13 (2012) 281–305.

(22)
University of Calgary, Algorithmic
Botany.
URL http://algorothmicbotany.org 
(23)
C. De La Higuera, A bibliographical study of grammatical inference, Pattern Recognition 38 (9) (2005) 1332–1348.
Comments
There are no comments yet.