正则表达式替换函数包含太多文本

问题描述:

我是一名python新手。我的脚本(如下)包含一个名为 “fn_regex_raw_date_string”的函数,用于将 这样的一个“原始”日期字符串转换为如下所示的日期字符串:2011年10月31日星期一8:15 PM -31_PM_8-15_正则表达式替换函数包含太多文本

问题1号:当“原始”的日期字符串包含多余的字符 如(xxxxxMon,2011年10月31日8时15分PMyyyyyy),应该如何 修改我的正则表达式例行排除无关字符?

I was tempted to remove my comments from the script below to make it 
    simpler to read, but I thought it might be more helpful for me to leave 
    them in the script. 

问题2:我怀疑,我应该代码的另一种功能在 “2011-OCT-31_PM_8-15_” 与 “11”,将 取代 “十月”。但我不能 帮助想知道是否有某种方法可以在我的fn_regex_raw_date_string函数 中包含该功能。

任何帮助将不胜感激。

谢谢 Marceepoo

import sys 
import re, pdb 
#pdb.set_trace() 

def fn_get_datestring_sysarg(): 
    this_scriptz_FULLName = sys.argv[0] 
    try: 
     date_string_raw = sys.argv[1] 
    #except Exception, e: 
    except Exception: 
     date_string_raw_error = this_scriptz_FULLName + ': sys.argv[1] error: No command line argument supplied' 
     print date_string_raw_error 
    #returnval = this_scriptz_FULLName + '\n' + date_string_raw 
    returnval = date_string_raw 
    return returnval 

def fn_regex_raw_date_string(date_string_raw): 
    # Do re replacements 
    # p:\Data\VB\Python_MarcsPrgs\Python_ItWorks\FixCodeFromLegislaturezCalifCode_MikezCode.py 
    # see also (fnmatch) p:\Data\VB\Python_MarcsPrgs\Python_ItWorks\bookmarkPDFs.aab.py 

    #srchstring = r"(.?+)(Sun|Mon|Tue|Wed|Thu|Fri|Sat)(,)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)()([\d]{1,2})(,)([\d]{4})(at)([\d]{1,2})(\:)([\d]{1,2})()(A|P)(M)(.?+)" 
    srchstring = r"(Sun|Mon|Tue|Wed|Thu|Fri|Sat)(,)(Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec)()([\d]{1,2})(,)([\d]{4})(at)([\d]{1,2})(\:)([\d]{1,2})()(A|P)(M)" 

    srchstring = re.compile(srchstring)  
    replacement = r"_\7-\3-\5_\13M_\9-\11_" 
    #replacement = r"_\8-\4-\6_\14M_\10-\12_"  
    regex_raw_date_string = srchstring.sub(replacement, date_string_raw) 

    return regex_raw_date_string 

    # Mon, Oct 31, 2011 at 8:15 PM 
if __name__ == '__main__': 
    try: 
     this_scriptz_FULLName = sys.argv[0] 
     date_string_raw = fn_get_datestring_sysarg() 
     date_string_mbh = fn_regex_raw_date_string(date_string_raw) 
     print date_string_mbh 
    except: 
     print 'error occurred - fn_get_datestring_sysarg()' 

该代码使用正则表达式,在替换一切在缩短的工作日之前匹配字符串的开始,以及那么匹配AM或PM后,所有内容都将放在字符串末尾。

然后调用datetime.strptime(date_str, date_format)这确实解析的辛勤工作和为我们提供了一个datetime实例:

from datetime import datetime 

import calendar 
import re 

# ------------------------------------- 

# _months = "|".join(calendar.month_abbr[1:]) 
_weekdays = "|".join(calendar.day_abbr) 

_clean_regex = re.compile(r""" 
    ^
    .*? 
    (?=""" + _weekdays + """) 
    | 
    (?<=AM|PM) 
    .*? 
    $ 
""", re.X) 

# ------------------------------------- 

def parseRawDateString(raw_date_str): 
    try: 
     date_str = _clean_regex.sub("", raw_date_str) 
     return datetime.strptime(date_str, "%a, %b %d, %Y at %I:%M %p") 

    except ValueError as ex: 
     print("Error parsing date from '{}'!".format(raw_date_str)) 
     raise ex 

# ------------------------------------- 

if __name__ == "__main__": 
    from sys import argv 

    s = argv[1] if len(argv) > 1 else "xxxxxMon, Oct 31, 2011 at 8:15 PMyyyyyy" 

    print("Raw date:  '{}'".format(s)) 
    d = parseRawDateString(s) 
    print("datetime object:") 
    print(d) 
    print("Formatted date: '{}'".format(d.strftime("%A, %d %B %Y @ %I:%M %p")))