Helpful+Expressions

This page contains some snippets for common expressions found in pyparsing grammars. Please feel free to post your own!

toc

=Days of the week / Months of the year= Use the built-in calendar module to provide lists of day and month names and abbreviations. code import calendar from pyparsing import oneOf

monthName = oneOf( list(calendar.month_name)[1:] ) monthAbbr = oneOf( list(calendar.month_abbr)[1:] ) dayName = oneOf( list(calendar.day_name) ) dayAbbr = oneOf( list(calendar.day_abbr) )

mname2mon = dict((m,i) for i,m in enumerate(calendar.month_abbr) if m) monthAbbr.setParseAction(lambda t: mname2mon[t[0]]) code
 * 1) parse action to convert month_abbr to value 1-12

=Chemical symbols of the elements= All of the elements, in a MatchFirst expression (oneOf will reorder the entries as necessary to make sure the "H" does not mask "He" or "Hg", for example). code element = oneOf( """H He Li Be B C N O F Ne Na Mg Al Si P S Cl           Ar K Ca Sc Ti V Cr Mn Fe Co Ni Cu Zn Ga Ge            As Se Br Kr Rb Sr Y Zr Nb Mo Tc Ru Rh Pd Ag            Cd In Sn Sb Te I Xe Cs Ba Lu Hf Ta W Re Os            Ir Pt Au Hg Tl Pb Bi Po At Rn Fr Ra Lr Rf            Db Sg Bh Hs Mt Ds Rg Uub Uut Uuq Uup Uuh Uus            Uuo La Ce Pr Nd Pm Sm Eu Gd Tb Dy Ho Er Tm            Yb Ac Th Pa U Np Pu Am Cm Bk Cf Es Fm Md No""" ) code or code element = Regex("A[cglmrstu]|B[aehikr]?|C[adeflmorsu]?|D[bsy]|"               "E[rsu]|F[emr]?|G[ade]|H[efgos]?|I[nr]?|Kr?|L[airu]|"                "M[dgnot]|N[abdeiop]?|Os?|P[abdmortu]?|R[abefghnu]|"                "S[bcegimnr]?|T[abcehilm]|Uu[bhopqst]|U|V|W|Xe|Yb?|Z[nr]") code

=UUIDs= Parses UUIDs, such as. code _hexStr = lambda n : Word(hexnums,exact=n) uuid = Combine(_hexStr(8)+"-"+_hexStr(4)+"-"+_hexStr(4)+"-"+_hexStr(4)+"-"+_hexStr(12)) code or //(this version requires pyparsing 1.4.10)// code _hexStr = lambda n : Word(hexnums,exact=n) uuid = Combine(_hexStr(8) + ("-"+_hexStr(4))*3 + "-" + _hexStr(12)) code

=MAC Address= Parses MAC addresses, in the form of 6 pairs of hex digits. //(requires pyparsing 1.4.10)// code _hex2 = Word(hexnums,exact=2) macAddr = Combine( _hex2 + (("-" + _hex2)*5 | (":" + _hex2)*5) ) code

=Timezone names= Worldwide and US-only lists of timezone names. (from []) code tzname = oneOf("""ACDT ACST ADT AEDT AEST AFT AHDT AHST AKDT AKST   AMST AMT ANAST ANAT ART AST AT AWDT AWST AZOST AZOT AZST AZT    BADT BAT BDST BDT BET BNT BORT BOT BRA BST BT BTT CAT CCT CDT    CEST CET CHADT CHAST CKT CLST CLT COT CST CUT CVT CWT CXT DAVT    DDUT DNT DST EASST EAST EAT ECT EDT EEST EET EGST EGT EMT EST    FDT FJST FJT FKST FKT FST FWT GALT GAMT GEST GET GFT GILT GMT    GST GT GYT GZ HAA HAC HADT HAE HAP HAR HAST HAT HAY HDT HFE    HFH HG HKT HL HNA HNC HNE HNP HNR HNT HNY HOE HST ICT IDLE    IDLW IDT IOT IRDT IRKST IRKT IRST IRT IST IT ITA JAVT JAYT JST    JT KDT KGST KGT KOST KRAST KRAT KST LHDT LHST LIGT LINT LKT    LST LT MAGST MAGT MAL MART MAT MAWT MDT MED MEDST MEST MESZ    MET MEWT MEX MEZ MHT MMT MPT MSD MSK MSKS MST MT MUT MVT MYT    NCT NDT NFT NOR NOVST NOVT NPT NRT NST NSUT NT NUT NZDT NZST    NZT OESZ OEZ OMSST OMST OZ PDT PET PETST PETT PGT PHOT PHT PKT PMDT PMT PNT PONT PST PWT PYST PYT RET ROK SADT SAST SBT SCT SET SGT SRT SST SWT SZ TAI TFT THA THAT TJT TKT TMT TOT TRUK TST TUC TVT ULAST ULAT UT UTC UTZ UYT UZT VET VLAST VLAT VTZ VUT WAKT WAST WAT WCT WEST WESZ WET WEZ WFT WGST WGT WIB WITA WIT WST WTZ WUT WZ YAKST YAKT YAPT YDT YEKST YEKT YST""")

us_tzname = oneOf("EST EDT CST CDT MST MDT PST PDT AKST AKDT HAST HADT HST") us_tzname = Regex("(([ECMP]|HA|AK)[SD]|HS)T") lower48us_tzname = oneOf("EST EDT CST CDT MST MDT PST PDT") lower48us_tzname = Regex("[ECMP][SD]T") code

=US State postal abbreviations= Postal abbreviations for US states and territories. code stateAbbreviation = oneOf("""AA AE AK AL AP AR AS AZ CA CO CT DC DE   FL FM GA GU HI IA ID IL IN KS KY LA MA MD ME MH MI MN MO MP MS    MT NC ND NE NH NJ NM NV NY OH OK OR PA PR PW RI SC SD TN TX UT    VA VI VT WA WI WV WY""") stateAbbreviation = Regex(r"A[AEKLPSRZ]|C[OAT]|D[EC]|F[LM]|G[UA]|HI|"   r"I[LNAD]|K[SY]|LA|M[EDAONIHTPS]|N[HJMCDEYV]|O[KHR]|P[ARW]|RI|"    r"S[CD]|T[XN]|UT|V[AIT]|W[AIVY]")

states = { 'AA' : 'Armed Forces Americas (except Canada)', 'AE' : 'Armed Forces Middle East', 'AK' : 'ALASKA', 'AL' : 'ALABAMA', 'AP' : 'Armed Forces Pacific', 'AR' : 'ARKANSAS', 'AS' : 'AMERICAN SAMOA', 'AZ' : 'ARIZONA', 'CA' : 'CALIFORNIA', 'CO' : 'COLORADO', 'CT' : 'CONNECTICUT', 'DC' : 'DISTRICT OF COLUMBIA', 'DE' : 'DELAWARE', 'FL' : 'FLORIDA', 'FM' : 'FEDERATED STATES OF MICRONESIA', 'GA' : 'GEORGIA', 'GU' : 'GUAM', 'HI' : 'HAWAII', 'IA' : 'IOWA', 'ID' : 'IDAHO', 'IL' : 'ILLINOIS', 'IN' : 'INDIANA', 'KS' : 'KANSAS', 'KY' : 'KENTUCKY', 'LA' : 'LOUISIANA', 'MA' : 'MASSACHUSETTS', 'MD' : 'MARYLAND', 'ME' : 'MAINE', 'MH' : 'MARSHALL ISLANDS', 'MI' : 'MICHIGAN', 'MN' : 'MINNESOTA', 'MO' : 'MISSOURI', 'MP' : 'NORTHERN MARIANA ISLANDS', 'MS' : 'MISSISSIPPI', 'MT' : 'MONTANA', 'NC' : 'NORTH CAROLINA', 'ND' : 'NORTH DAKOTA', 'NE' : 'NEBRASKA', 'NH' : 'NEW HAMPSHIRE', 'NJ' : 'NEW JERSEY', 'NM' : 'NEW MEXICO', 'NV' : 'NEVADA', 'NY' : 'NEW YORK', 'OH' : 'OHIO', 'OK' : 'OKLAHOMA', 'OR' : 'OREGON', 'PA' : 'PENNSYLVANIA', 'PR' : 'PUERTO RICO', 'PW' : 'PALAU', 'RI' : 'RHODE ISLAND', 'SC' : 'SOUTH CAROLINA', 'SD' : 'SOUTH DAKOTA', 'TN' : 'TENNESSEE', 'TX' : 'TEXAS', 'UT' : 'UTAH', 'VA' : 'VIRGINIA', 'VI' : 'VIRGIN ISLANDS', 'VT' : 'VERMONT', 'WA' : 'WASHINGTON', 'WI' : 'WISCONSIN', 'WV' : 'WEST VIRGINIA', 'WY' : 'WYOMING', }

stateAbbreviation.setParseAction(lambda t:states[t[0]]) code
 * 1) add parse action to convert abbreviation to full state name

=E-mail addresses= E-mail addresses are notorious for having many tortuous and arcane forms, since e-mail has been around since the early days of the internet, and so has evolved through many "standards". The expression below comes from this website: http://www.regular-expressions.info/email.html, and covers most common e-mail addresses in use today. code emailExpr = Regex(r"(?P [A-Za-z0-9._%+-]+)@(?P [A-Za-z0-9.-]+)\.(?P [A-Za-z]{2,4})") code

The named re fields get translated into results names by pyparsing. For example: code print emailExpr.parseString("paul@users.sourceforge.net").dump code prints out: code ['paul@users.sourceforge.net'] - domain: net - hostname: users.sourceforge - user: paul code

=ANSI Terminal Escape Sequences= Back in the day, computers were huge distant servers walled off in The Computer Room. On your desk, you probably had a "dumb" terminal, like a VT100. These terminals supported a special language of escape sequences to move the cursor about, clear parts of the screen, change the screen scroll region, display in color, flashing, bold, or reverse, and so on. Some sequences would even flash the lights on the keyboard. Sometimes you would retrieve a log of one of these terminal sessions, and it would be littered with the control sequences. This parser is generic enough to match all or most of them. code ESC = Literal('\x1b') integer = Word(nums) escapeSeq = Combine(ESC + '[' +               Optional(delimitedList(integer,';')) + oneOf(list(alphas))) code