Bit-coded regular expression parsing

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Standard

Bit-coded regular expression parsing. / Nielsen, Lasse; Henglein, Fritz.

Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings. ed. / Adrian-Horia Dediu; Shunsuke Inenaga; Carlos Martín-Vide. Springer, 2011. p. 402-413 (Lecture notes in computer science, Vol. 6638).

Research output: Chapter in Book/Report/Conference proceedingArticle in proceedingsResearchpeer-review

Harvard

Nielsen, L & Henglein, F 2011, Bit-coded regular expression parsing. in A-H Dediu, S Inenaga & C Martín-Vide (eds), Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings. Springer, Lecture notes in computer science, vol. 6638, pp. 402-413, 5th International Conference on Language and Automata Theory and Applications, Tarragona, Spain, 26/05/2011. https://doi.org/10.1007/978-3-642-21254-3_32

APA

Nielsen, L., & Henglein, F. (2011). Bit-coded regular expression parsing. In A-H. Dediu, S. Inenaga, & C. Martín-Vide (Eds.), Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings (pp. 402-413). Springer. Lecture notes in computer science Vol. 6638 https://doi.org/10.1007/978-3-642-21254-3_32

Vancouver

Nielsen L, Henglein F. Bit-coded regular expression parsing. In Dediu A-H, Inenaga S, Martín-Vide C, editors, Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings. Springer. 2011. p. 402-413. (Lecture notes in computer science, Vol. 6638). https://doi.org/10.1007/978-3-642-21254-3_32

Author

Nielsen, Lasse ; Henglein, Fritz. / Bit-coded regular expression parsing. Language and Automata Theory and Applications: 5th International Conference, LATA 2011, Tarragona, Spain, May 26-31, 2011. Proceedings. editor / Adrian-Horia Dediu ; Shunsuke Inenaga ; Carlos Martín-Vide. Springer, 2011. pp. 402-413 (Lecture notes in computer science, Vol. 6638).

Bibtex

@inproceedings{d3a5020eab35414bb3c794624ea5b49c,
title = "Bit-coded regular expression parsing",
abstract = "Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dub ´e and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli{\textquoteright}s greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA- nodes, but that the latter may still perform better on regular expressions that are “more deterministic” from the right than the left. (Backtracking is, unsurprisingly, quite hopeless.)",
author = "Lasse Nielsen and Fritz Henglein",
year = "2011",
doi = "10.1007/978-3-642-21254-3_32",
language = "English",
isbn = "978-3-642-21253-6",
series = "Lecture notes in computer science",
publisher = "Springer",
pages = "402--413",
editor = "Adrian-Horia Dediu and Shunsuke Inenaga and Carlos Mart{\'i}n-Vide",
booktitle = "Language and Automata Theory and Applications",
address = "Switzerland",
note = "5th International Conference on Language and Automata Theory and Applications, LATA 2011 ; Conference date: 26-05-2011 Through 31-05-2011",

}

RIS

TY - GEN

T1 - Bit-coded regular expression parsing

AU - Nielsen, Lasse

AU - Henglein, Fritz

N1 - Conference code: 5

PY - 2011

Y1 - 2011

N2 - Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dub ´e and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli’s greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA- nodes, but that the latter may still perform better on regular expressions that are “more deterministic” from the right than the left. (Backtracking is, unsurprisingly, quite hopeless.)

AB - Regular expression parsing is the problem of producing a parse tree of a string for a given regular expression. We show that a compact bit representation of a parse tree can be produced efficiently, in time linear in the product of input string size and regular expression size, by simplifying the DFA-based parsing algorithm due to Dub ´e and Feeley to emit the bits of the bit representation without explicitly materializing the parse tree itself. We furthermore show that Frisch and Cardelli’s greedy regular expression parsing algorithm can be straightforwardly modified to produce bit codings directly. We implement both solutions as well as a backtracking parser and perform benchmark experiments to gauge their practical performance. We observe that our DFA-based solution can be significantly more time and space efficient than the Frisch-Cardelli algorithm due to its sharing of DFA- nodes, but that the latter may still perform better on regular expressions that are “more deterministic” from the right than the left. (Backtracking is, unsurprisingly, quite hopeless.)

U2 - 10.1007/978-3-642-21254-3_32

DO - 10.1007/978-3-642-21254-3_32

M3 - Article in proceedings

SN - 978-3-642-21253-6

T3 - Lecture notes in computer science

SP - 402

EP - 413

BT - Language and Automata Theory and Applications

A2 - Dediu, Adrian-Horia

A2 - Inenaga, Shunsuke

A2 - Martín-Vide, Carlos

PB - Springer

T2 - 5th International Conference on Language and Automata Theory and Applications

Y2 - 26 May 2011 through 31 May 2011

ER -

ID: 50858641