This page contains some snippets for common expressions found in pyparsing grammars. Please feel free to post your own!


Days of the week / Months of the year

Use the built-in calendar module to provide lists of day and month names and abbreviations.
import calendar
from pyparsing import oneOf
 
monthName = oneOf( list(calendar.month_name)[1:] )
monthAbbr = oneOf( list(calendar.month_abbr)[1:] )
dayName = oneOf( list(calendar.day_name) )
dayAbbr = oneOf( list(calendar.day_abbr) )
 
# parse action to convert month_abbr to value 1-12
mname2mon = dict((m,i) for i,m in enumerate(calendar.month_abbr) if m)
monthAbbr.setParseAction(lambda t: mname2mon[t[0]])

Chemical symbols of the elements

All of the elements, in a MatchFirst expression (oneOf will reorder the entries as necessary to make sure the "H" does not mask "He" or "Hg", for example).
element = oneOf( """H He Li Be B C N O F Ne Na Mg Al Si P S Cl
            Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge
            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag
            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os
            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf
            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus
            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm
            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No""" )
or
element = Regex("A[cglmrstu]|B[aehikr]?|C[adeflmorsu]?|D[bsy]|"
                "E[rsu]|F[emr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airu]|"
                "M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|"
                "S[bcegimnr]?|T[abcehilm]|Uu[bhopqst]|U|V|W|Xe|Yb?|Z[nr]")

UUIDs

Parses UUIDs, such as 'db9674c4-72a9-4ab9-9ddd-1d641a37cde4'.
_hexStr = lambda n : Word(hexnums,exact=n)
uuid = Combine(_hexStr(8)+"-"+_hexStr(4)+"-"+_hexStr(4)+"-"+_hexStr(4)+"-"+_hexStr(12))
or (this version requires pyparsing 1.4.10)
_hexStr = lambda n : Word(hexnums,exact=n)
uuid = Combine(_hexStr(8) + ("-"+_hexStr(4))*3 + "-" + _hexStr(12))

MAC Address

Parses MAC addresses, in the form of 6 pairs of hex digits. (requires pyparsing 1.4.10)
_hex2 = Word(hexnums,exact=2)
macAddr = Combine( _hex2 + (("-" + _hex2)*5 | (":" + _hex2)*5) )

Timezone names

Worldwide and US-only lists of timezone names. (from http://www.worldtimezone.com/wtz-names/timezonenames.html)
tzname = oneOf("""ACDT ACST ADT AEDT AEST AFT AHDT AHST AKDT AKST
    AMST AMT ANAST ANAT ART AST AT AWDT AWST AZOST AZOT AZST AZT
    BADT BAT BDST BDT BET BNT BORT BOT BRA BST BT BTT CAT CCT CDT
    CEST CET CHADT CHAST CKT CLST CLT COT CST CUT CVT CWT CXT DAVT
    DDUT DNT DST EASST EAST EAT ECT EDT EEST EET EGST EGT EMT EST
    FDT FJST FJT FKST FKT FST FWT GALT GAMT GEST GET GFT GILT GMT
    GST GT GYT GZ HAA HAC HADT HAE HAP HAR HAST HAT HAY HDT HFE
    HFH HG HKT HL HNA HNC HNE HNP HNR HNT HNY HOE HST ICT IDLE
    IDLW IDT IOT IRDT IRKST IRKT IRST IRT IST IT ITA JAVT JAYT JST
    JT KDT KGST KGT KOST KRAST KRAT KST LHDT LHST LIGT LINT LKT
    LST LT MAGST MAGT MAL MART MAT MAWT MDT MED MEDST MEST MESZ
    MET MEWT MEX MEZ MHT MMT MPT MSD MSK MSKS MST MT MUT MVT MYT
    NCT NDT NFT NOR NOVST NOVT NPT NRT NST NSUT NT NUT NZDT NZST
    NZT OESZ OEZ OMSST OMST OZ PDT PET PETST PETT PGT PHOT PHT PKT
    PMDT PMT PNT PONT PST PWT PYST PYT RET ROK SADT SAST SBT SCT
    SET SGT SRT SST SWT SZ TAI TFT THA THAT TJT TKT TMT TOT TRUK
    TST TUC TVT ULAST ULAT UT UTC UTZ UYT UZT VET VLAST VLAT VTZ
    VUT WAKT WAST WAT WCT WEST WESZ WET WEZ WFT WGST WGT WIB WITA
    WIT WST WTZ WUT WZ YAKST YAKT YAPT YDT YEKST YEKT YST""")
 
us_tzname = oneOf("EST EDT CST CDT MST MDT PST PDT AKST AKDT HAST HADT HST")
us_tzname = Regex("(([ECMP]|HA|AK)[SD]|HS)T")
lower48us_tzname = oneOf("EST EDT CST CDT MST MDT PST PDT")
lower48us_tzname = Regex("[ECMP][SD]T")

US State postal abbreviations

Postal abbreviations for US states and territories.
stateAbbreviation = oneOf("""AA AE AK AL AP AR AS AZ CA CO CT DC DE
    FL FM GA GU HI IA ID IL IN KS KY LA MA MD ME MH MI MN MO MP MS
    MT NC ND NE NH NJ NM NV NY OH OK OR PA PR PW RI SC SD TN TX UT
    VA VI VT WA WI WV WY""")
stateAbbreviation = Regex(r"A[AEKLPSRZ]|C[OAT]|D[EC]|F[LM]|G[UA]|HI|"
    r"I[LNAD]|K[SY]|LA|M[EDAONIHTPS]|N[HJMCDEYV]|O[KHR]|P[ARW]|RI|"
    r"S[CD]|T[XN]|UT|V[AIT]|W[AIVY]")
 
states = {
    'AA' : 'Armed Forces Americas (except Canada)',
    'AE' : 'Armed Forces Middle East',
    'AK' : 'ALASKA',
    'AL' : 'ALABAMA',
    'AP' : 'Armed Forces Pacific',
    'AR' : 'ARKANSAS',
    'AS' : 'AMERICAN SAMOA',
    'AZ' : 'ARIZONA',
    'CA' : 'CALIFORNIA',
    'CO' : 'COLORADO',
    'CT' : 'CONNECTICUT',
    'DC' : 'DISTRICT OF COLUMBIA',
    'DE' : 'DELAWARE',
    'FL' : 'FLORIDA',
    'FM' : 'FEDERATED STATES OF MICRONESIA',
    'GA' : 'GEORGIA',
    'GU' : 'GUAM',
    'HI' : 'HAWAII',
    'IA' : 'IOWA',
    'ID' : 'IDAHO',
    'IL' : 'ILLINOIS',
    'IN' : 'INDIANA',
    'KS' : 'KANSAS',
    'KY' : 'KENTUCKY',
    'LA' : 'LOUISIANA',
    'MA' : 'MASSACHUSETTS',
    'MD' : 'MARYLAND',
    'ME' : 'MAINE',
    'MH' : 'MARSHALL ISLANDS',
    'MI' : 'MICHIGAN',
    'MN' : 'MINNESOTA',
    'MO' : 'MISSOURI',
    'MP' : 'NORTHERN MARIANA ISLANDS',
    'MS' : 'MISSISSIPPI',
    'MT' : 'MONTANA',
    'NC' : 'NORTH CAROLINA',
    'ND' : 'NORTH DAKOTA',
    'NE' : 'NEBRASKA',
    'NH' : 'NEW HAMPSHIRE',
    'NJ' : 'NEW JERSEY',
    'NM' : 'NEW MEXICO',
    'NV' : 'NEVADA',
    'NY' : 'NEW YORK',
    'OH' : 'OHIO',
    'OK' : 'OKLAHOMA',
    'OR' : 'OREGON',
    'PA' : 'PENNSYLVANIA',
    'PR' : 'PUERTO RICO',
    'PW' : 'PALAU',
    'RI' : 'RHODE ISLAND',
    'SC' : 'SOUTH CAROLINA',
    'SD' : 'SOUTH DAKOTA',
    'TN' : 'TENNESSEE',
    'TX' : 'TEXAS',
    'UT' : 'UTAH',
    'VA' : 'VIRGINIA',
    'VI' : 'VIRGIN ISLANDS',
    'VT' : 'VERMONT',
    'WA' : 'WASHINGTON',
    'WI' : 'WISCONSIN',
    'WV' : 'WEST VIRGINIA',
    'WY' : 'WYOMING',
    }
 
# add parse action to convert abbreviation to full state name
stateAbbreviation.setParseAction(lambda t:states[t[0]])

E-mail addresses

E-mail addresses are notorious for having many tortuous and arcane forms, since e-mail has been around since the early days of the internet, and so has evolved through many "standards". The expression below comes from this website: http://www.regular-expressions.info/email.html, and covers most common e-mail addresses in use today.
emailExpr = Regex(r"(?P<user>[A-Za-z0-9._%+-]+)@(?P<hostname>[A-Za-z0-9.-]+)\.(?P<domain>[A-Za-z]{2,4})")

The named re fields get translated into results names by pyparsing. For example:
print emailExpr.parseString("paul@users.sourceforge.net").dump()
prints out:
['paul@users.sourceforge.net']
- domain: net
- hostname: users.sourceforge
- user: paul


ANSI Terminal Escape Sequences

Back in the day, computers were huge distant servers walled off in The Computer Room. On your desk, you probably had a "dumb" terminal, like a VT100. These terminals supported a special language of escape sequences to move the cursor about, clear parts of the screen, change the screen scroll region, display in color, flashing, bold, or reverse, and so on. Some sequences would even flash the lights on the keyboard. Sometimes you would retrieve a log of one of these terminal sessions, and it would be littered with the control sequences. This parser is generic enough to match all or most of them.
ESC = Literal('\x1b')
integer = Word(nums)
escapeSeq = Combine(ESC + '[' + 
               Optional(delimitedList(integer,';')) + oneOf(list(alphas)))