Comments+and+Feedback

Behaviour change between 1.5.6 and 1.5.2
Pyparsing in 1.5.6 transforms certain expressions more aggressively than it did 1.5.2, in particular with setResultsName. Consider the following example with a stripped-down grammar that is manipulated to produce a tree in a separate function (this is supposed to illustrate why this bites me): code format="python"

def getGrammar: from pyparsing import (Forward, CaselessKeyword, Word, alphas,       ParserElement, ZeroOrMore, Literal, Optional)

tableName = Word(alphas) joinedTable = Word("+-") tableReference = (joinedTable       | tableName) fromClause = (CaselessKeyword("FROM")       + tableReference)("fromClause")

return dict((k, v) for k, v in locals.iteritems       if isinstance(v, ParserElement))

def enableTree(syms): def makeAction(name): def action(s, pos, toks): return [name, toks] return action for name in syms: syms[name].addParseAction(makeAction(name))

if __name__=="__main__": import pprint syms = getGrammar enableTree(syms) pprint.pprint(syms["fromClause"].parseString("FROM ab").asList)

code With pyparsing 1.5.2, this would print code

['fromClause', ['FROM', 'tableReference', ['tableName', ['ab']]]]

code whereas 1.5.6 folds the subexpressions into fromClause (but only when fromClause carries a result name); with this particular scheme, this has user-visible consequences in that the program prints code

['fromClause', ['FROM', 'ab']]

code I admit I've not really traced this yet, but since I suspected that is now being called more liberally I tried to inhibit its actions by ing the RHS symbols in the fromClause rule before use. Alas, to no avail. So -- is there anything more sensible I can do to get 1.5.2 results from 1.5.6? Or do you consider my scheme of adding actions long after the symbols have been defined as too harebrained? — Markus

Python 2 version works. (refer to http://pypi.python.org/pypi/pyparsing/1.5.6)
(edit 2)

used the installers at http://pypi.python.org/pypi/pyparsing/1.5.6 tried Python 3 version. Does not work for py 3 and running 2to3 tool did not help. tried the version for python 2.7 and this works.

some ... seems to just take the pyparsing code unchanged, create installers for python 3 and publish it untested on the pypi website. So for Python 3 check out LEPL (edit 2: and Modgrammar) instead of Pyparsing.

However thanks for Pyparsing for Python 2 which is a nice and working Parser

Parse Result Object Behaves in Unexpected ways . ..
Running a getattr on any parse result object returns an empty string. This can lead to very confusing behavior like: result.aslist

TypeError: 'str' object is not callable, when this is usually thought of as an attribute error. This could lead to a lot of problems passing quietly. ..

-Matt G. (meawoppl at some google mail service)

Simultaneous rules and why tabs must be special on default
1.

I should check that a message A fulfills a quite complex BNF-grammar and that message A's length is not greater than 'l'. Of course I can parse the message and then check if the length limitations are not violated. The problem is, that this message A is a part of bigger grammar B and that there might be multiple instances of message A inside B.

Even further, is it possible to make easily an element, that has two or more rules that must be valid at the same time?

For the length example above, this could mean for example something like code elementWithMaxLength = complexElement & Regex('.{1,%d}'%l, flags=re.S+re.M).suppress code .. with setParseAction perhaps? 2.

Could it be possible that the default value for 'keepTabs' would be True, because it was annoying to find out that tabs are special.

-- kummahiih

Extending ParseResults class
Hi, I've tried to extend this class into a new one "CodeItem" that automatically handles code location for itself and its sub-items. i think it's convenient for reporting semantic errors. Also there's a simple class ParsingError that takes a message and problem CodeItems as parameters.

The most difficulty I had with - is that ParseResults changes its "appearance" depending on whether instance is named or not, so I've re-implemented getName and __getitem__ may be a little sketchy.

Does it make sense for a parser to remember each item's offset automatically? Maybe it will cost some speed - not sure how critical this loss would be though.

Also I've created a wrapper decorator function that would adapt parseAction handler functions to use CodeItems instead of (s,loc,tok) arguments. -Evgeny. code format="python" class ParsingError(Exception): def __init__(self,msg,*items): self.msg = msg self.items = items def __str__(self): out = [t.info for t in self.items] return 'parsing error: %s\nproblem item(s):\n%s' \ % (self.msg,'\n'.join(out))

def tok2str(tok): if isinstance(tok,ParseResults): list = tok.asList return tok2str(list) elif type(tok) == type([]): out = '' for item in tok: out = out + tok2str(item) return out elif isinstance(tok,str): return tok else: raise Exception('internal error type=%s' % type(tok))

def toklen(tok): s = tok2str(tok) return len(s)

class CodeItem(ParseResults): def __init__(self,s,loc,t):

if isinstance(t,ParseResults): name = t.getName tlist = t.asList else: name = None tlist = t

ParseResults.__init__(self,tlist)

self.__ci_name = name self.__ci_source = s       self.__ci_loc = loc

def getName(self): return self.__ci_name

def __str__(self): tok = ParseResults.__str__(self) lineno = self.lineno col = self.col return 'line=%d col=%d tokens=%s' % (lineno,col,tok)

def __repr__(self): return ParseResults.__str__(self)

def info(self): src = self.source line = self.lineno col = self.col return '(line=%3d col=%3d) %s' % (line,col,src)

def col(self): return col(self.__ci_loc,self.__ci_source)

def lineno(self): return lineno(self.__ci_loc,self.__ci_source)

def source(self): return tok2str(self)

def __getitem__(self,i): #this function could be made faster by optimizing toklen function

t = ParseResults.__getitem__(self,i)

#here I might want to support slicing as well if isinstance(i,int): offset = 0 for j in range(i): tj = ParseResults.__getitem__(self,j) offset += toklen(tj) loc = self.__ci_loc + offset s = self.__ci_source return CodeItem(s,loc,t) else: #here is a hole, maybe CodeItem should be constructed instead #I had problems with named ParseResults return t

code Wrapper function that will convert tokens into CodeItem's and can be used as decorator for parseAction functions as defined in pyparsing.py This one takes only named parsing results and reconfigures the parseAction to take CodeItem instead of s,loc,tok arguments. code format="python" def wrap_named_tokens(f): """filters named tokens   """ def wrapper(self,s,loc,tok): #code_items = TokenList(s,loc,tok) code_items = [] cloc = loc for t in tok: inc = toklen(t) if isinstance(t,ParseResults) and t.getName != None: code_items.append(CodeItem(s,cloc,t)) cloc = cloc + inc f(self,code_items) return functools.update_wrapper(wrapper,f)

code Example of parseAction definition: code @wrap_named_tokens def some_parse_action(code_items): for item in code_items: if is_not_good(item): raise ParsingError('this code has error',item) else: do_something_with(item) code The error handler will print line and column numbers automatically.

Setting whitespace characters after defaulting
I am trying to parse a language which has one line statements which are separated by newlines and possible blank lines. In order to parse it I tried setting the default whitespace chars to " \t" and specific whitespace chars for document parser to " \t\n" but I'm not getting the desired effect. Here's an example: code from pyparsing import *

ParserElement.setDefaultWhitespaceChars(" \t")

statement = Literal("foobar") | Word(nums)

statements = ZeroOrMore(statement) statements.setWhitespaceChars(" \t\n")

document = StringStart + statements + StringEnd document.setWhitespaceChars(" \t\n")

test = "5498\n foobar" print test, "->", document.parseString(test)

code Which raises an error when it hits the newline char. However, if I manually set the whitespace chars for all items, it works as expected: code from pyparsing import *

statement = Literal("foobar") | Word(nums) statement.setWhitespaceChars(" \t")

statements = ZeroOrMore(statement) statements.setWhitespaceChars(" \t\n")

document = StringStart + statements + StringEnd document.setWhitespaceChars(" \t\n")

test = "5498\n foobar" print test, "->", document.parseString(test

code produces: code 5498 foobar -> ['5498', 'foobar']

code

Am I misunderstanding these commands, or is there a better way to do this?

-Shawn

[reply from Paul] Shawn -

Well, there is a little confusion on your part, but there is also a subtle bug in pyparsing that prevents you from doing this the actual correct way. Here is the code as I imagine it should be written. code from pyparsing import *

ParserElement.setDefaultWhitespaceChars(" \t")

statement = (Literal("foobar") | Word(nums)) + LineEnd.suppress statements = ZeroOrMore(statement) document = StringStart + statements + StringEnd

test = "5498\n foobar" print test, "->", document.parseString(test) code

Only a single call to setDefaultWhitespaceChars, no need to set them on individual parse expressions. However, there is a bug in StringEnd that raises an exception when reading both a LineEnd and a StringEnd at the end of the input string (which I will have fixed in the online CVS code in a few minutes). Note that in your original code that did not work, there was no place for the line breaks to be either parsed or skipped over. The setting of whitespace chars only affects the skipping over of whitespace at the beginning of an expression, so setting whitespace to " \t\n" for document only skips over those characters at the very beginning, not during all immediate child elements of document.

I resolved the newline processing question by leaving in your call to setDefaultWhitespaceChars, and then adding an explicit parse expression to read newlines at the end of each statement, since this is the only place where you want to see newlines.

-- Paul

Can setParseActions be used deeper into the parse hierarchy ?
/* dfadsfasdfasdfasdf */ comment startcomment + SkipTo(endcomment,include=True) comment.setParseActions( replaceWith("COMMENT"))

grammar = OneOrMore( comment | command1 | command2 )

result = grammar.transformString( inputstring )

setParseActions can be attached anywhere in the hierarchy. Your example should work ok. As far as handling comments, you can also look at using ignore: code format="python" grammar = OneOrMore( command1 | command2 ) grammar.ignore( comment ) result = grammar.transformString( inputstring ) code The reason this is important is that comments can appear even in the middle of a command.

Is there a best practice for parsing mixed content?
Currently I'm having a hard time using parseString to analyze a wiki paragraph containing mixed content. For example: code Wiki paragraphs can contain links as well as **bold** and //italic// text. code How would the rules for this paragraph look like that will also preserve the text between wiki markup? Are there any examples which I could have a look at?

For this kind of parsing, parseString is not the best method to use. Just for review, there are now 4 different ways to invoke a pyparsing grammar: - parseString - parses input string from the beginning, until a mismatch is found or the end of the grammar - scanString - a generator for partial string matching; returns the matched tokens, and start and end locations of the match - transformString - wrapper around scanString to apply parse actions to transform the input string - searchString - wrapper around scanString to return a list of the matched tokens

As you have found, parseString is suitable only if you have a grammar that completely defines the content of the input text. scanString is able to "scan" through the input text, looking for matches - this is closer to what you want, since it only requires definition of pyparsing expressions for that which you are scanning for. transformString and searchString are simple wrappers around scanString, for the most common applications of scanString: converting expressions based on parseActions, and searching for matches and returning a list of matches. So for a wiki markup processor, I'd say transformString is the best fit. In fact, there is a new example on the Examples page titled simpleWiki.py. The one complication is when you get markup nested within markup, but with a little diligence, I hope you can get it worked out.

PyParsing Support Added to Utility Mill
I thought you guys might be interested. You can now [|make web based utilities using the pyparsing module]. As an example I implemented the [|chemical formula parser example here].

I think this could be very useful for making a quick utility where you want a user to enter some string to be parsed, and easily use pyparser to do the work. Let me know what you think.

Unicode issues
When parsing Unicode strings, PyParsing returns a mixture of unicode and str objects as a result (ASCII strings are always converted to str, others are left intact). This probably should not happen, and intermixing byte strings with Unicode strings is usually not a good idea. I suggest the following patch: code --- pyparsing.py.orig  2008-04-21 23:18:59.000000000 +0600 +++ pyparsing.py       2008-04-21 23:21:53.000000000 +0600 @@ -87,6 +87,11 @@       str(obj). If that fails with a UnicodeEncodeError, then it tries unicode(obj). It       then < returns the unicode object | encodes it with the default encoding | ... >.    """ + +    # Do not convert unicode to str +    if isinstance(obj, unicode): +        return obj +     try:         # If this works, then _ustr(obj) has the same behaviour as str(obj), so         # it won't break any existing code.

code

__equality / equivalency between grammars__
I might be the first person to ever equality-test pyparsing grammars, but I need to for pyparsing_helper to work right, and it looks like ParserElement. eq wasn't written to support that (as of 1.5.1).

code In [41]: Literal('a') == "a" Out[41]: True

In [42]: Literal('a') == Literal('a') Out[42]: False code

I've submitted [|a patch], but in the meantime, here's a monkeypatch.

code def _eq_monkeypatch(self, other): if isinstance(other, pyparsing.ParserElement): return self.__dict__ == other.__dict__ elif isinstance(other, basestring): try: (self + StringEnd).parseString(_ustr(other)) return True except ParseBaseException: return False else: return super(ParserElement,self)==other

pyparsing.ParserElement.__eq__ = _eq_monkeypatch code

pyparsing.ParserElement.__eq__ = _eq_monkeypatch

//This was fixed in pyparsing 1.5.2. -- Paul//

Generating EBNF-like things from pyparsing grammars?
I'd like to generate some variant of EBNF -- it doesn't need to be too strict -- from a pyparsing grammar. Has anyone tried to do such a thing?

-- Markus

[ reply from Paul ] Pyparsing's expressions are already self-describing in a quasi-BNF format. For example, here are some of my typical examples (a server name that could be a host name or an IP address), and how they look if printed out: code >>> integer = Word(nums) >>> print integer W:(0123...)

>>> hostname= Word(alphas, alphanums) >>> print hostname W:(abcd...,abcd...) code

Since hostname uses different sets of characters for its initial vs. body character, it displays a two-argument format. Unfortunately, the truncation feature clips the significant difference (that the body can contain numeric digits in addition to alpha characters).

Now if we assemble these base expressions into an IP address, we see a couple of other problems: code >>> ip_addr = integer + '.' + integer + '.' + integer + '.' + integer >>> print ip_addr {{{{{{W:(0123...) "."} W:(0123...)} "."} W:(0123...)} "."} W:(0123...)} code

We really don't want ip_addr to resolve any deeper than its component expressions. For readability's sake, pyparsing allows you to attach a name to an expression (this is //not// the same as setResultsName): code >>> integer.setName("integer") integer code

Now if we rebuild our ip_addr expression and print out its representation, things are a little better: code >>> ip_addr = integer + '.' + integer + '.' + integer + '.' + integer >>> print ip_addr {{{{{{integer "."} integer} "."} integer} "."} integer} code

Hmmm, still some room for improvement. What are seeing is the intermediate form that gets created by the '+' operator, which calls ``ParserElement.__add__(a,b)``, and returns And([a,b]). Since ``__add__`` can only see two elements at a time, an expression like "a + b + c" returns the nested And([And([a,b]),c]). This is where pyparsing has to do some reshuffling, since the user did not really add any such structure, and would just like things to be processed like And([a,b,c]). So pyparsing has an internal method named streamline that tries to clean things up a bit. streamline looks at expressions of like type and tries to collapse unnecessary nesting (while still preserving things like results names, grouping, etc.). If we call it, we can see the results: code >>> ip_addr.streamline {integer "." integer "." integer "." integer} code

Now this is a //lot// cleaner! (User code rarely needs to call streamline, it is automatically called as part of the logic in parseString.)

But this is only helpful for showing the top-level expression. If we want to drill down into the parser, we'll need to peel away the names we gave the sub-expressions. See how this is done in the attached little script: code from pyparsing import *

integer = Word(nums).setName("integer") ip_addr = integer + '.' + integer + '.' + integer + '.' + integer

hostname = Word(alphas, alphanums+'_').setName("hostname")

hostref = hostname | ip_addr

hostref.streamline
 * 1) internal pyparsing method, rarely called in user code

for exprname in "hostref hostname integer".split: expr = locals[exprname] e = expr.copy if hasattr(e,"name"): del e.name print exprname,'::',e code

Prints: code hostref :: {hostname | {integer "." integer "." integer "." integer}} hostname :: W:(abcd...,abcd...) integer :: W:(0123...) code

This isn't a complete solution, but maybe it will give you some ideas on how to approach your problem. -- Paul

[Markus again]

Thanks, Paul. I should really learn to control my coding habit, since of course I got impatient while offline and now coded something that could have made really good use of streamline. Anyway, there are quite a few subtleties I'd probably have encountered even with streamline. If someone needs something like this: http://www.tfiu.de/homepage/hacks/#pyparsingToEBNF (warning: much more verbose than Paul's suggestions)-- and I'll gladly expand it if someone actually uses it.

[Ben Liles]

Would it be possible to remove the download url from the pypi record so that easy_install will download the tar.gz uploaded to the pypi? That way it won't have to read from wikispaces. I'm using [|buildout] and cannot specify the full url to get it from.

//Try it now. - Paul//

Error installing
I see in the README that python 2.3.2 or later is required. I am running 2.3.4 on RedHat, and got this error when I tried to install: code [root@host pyparsing-1.5.0]# python setup.py install Traceback (most recent call last): File "setup.py", line 6, in ? from pyparsing import __version__ File "/var/tmp/pyparsing-1.5.0/pyparsing.py", line 2506 matchOrder += list(e for e in self.exprs if isinstance(e,Optional) and e.expr in tmpOpt) ^ SyntaxError: invalid syntax code Should I upgrade? [Mark]

Another syntax error raised during install (pyparsing_py3.py, line 2470)
This is my //sys.version//: 2.5.1 (r251:54863, Feb 6 2009, 19:02:12) [GCC 4.0.1 (Apple Inc. build 5465)]

The line with the raised syntax error: code format="python" except ParseException as err: code The log: code $ python setup.py install running install running build running build_py creating build creating build/lib copying pyparsing.py -> build/lib copying pyparsing_py3.py -> build/lib running install_lib copying build/lib/pyparsing.py -> /Library/Python/2.5/site-packages copying build/lib/pyparsing_py3.py -> /Library/Python/2.5/site-packages byte-compiling /Library/Python/2.5/site-packages/pyparsing_py3.py to pyparsing_py3.pyc File "/Library/Python/2.5/site-packages/pyparsing_py3.py", line 2470 except ParseException as err: ^ SyntaxError: invalid syntax

running install_egg_info Writing /Library/Python/2.5/site-packages/pyparsing-1.5.2-py2.5.egg-info code

Anyway, the greeting.py example //does// work.

[2009-10-06]

2010/05/15: Same problem on CygWin with Python 2.5. Replace "as" with ",". Or just ignore the error because that module is intended for v3 only.

know this is random, but can we have a better page were you can comment or bring ideas that the owners can have a look at? or can i email the owners about a new idea?

//Post it to the Discussion tab on the Pyparsing WIki home page. (Anyone can post discussion comments)//

alphas is locale-dependent
The documentation claims that "alphas" is 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghikjlmnopqrstuvwxyz'. But in reality this is not the case! It constructs it out of alphas.uppercase and alphas.lowercase, which is locale-dependent -- and on my system is full of accented characters! It obviously doesn't make sense to have a programming language whose legal identifiers vary from system to system, so why not just use the literal string 'ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghikjlmnopqrstuvwxyz'? - Kef Schecter