python - string mask and offset with regex -

March 15, 2011

I have a string on which I try to create a regex mask which is of N The word will show the number, an offset given that I have the following string:

"Quick, brown fox jumps on lazy dogs."

I want to show 3 words:

offset 0 : "quick, brown"
Offset 1 : "quick, brown fox"
offset 2 : "brown fox jump" Offset 3 : "offset 5 : " offset <"> on the office 4 : "jumps over" more than lazy " offset 6 : " lazy dogs. "

  And I'm using the following simple regex to find 3 words Land: 
   & gt; & Gt; & Gt; Import re  
  & gt; & Gt; & Gt; S = "Quick, brown fox jumps on lazy dogs."  
  & gt; & Gt; & Gt; Re.search (r '(\ w + w *) {3}', s) .group ()  
  'quick, brown'  
 < P> But I can not understand what kind of mask should be to show the next 3 words, not the beginners. I need to put a punctuation mark.

  
  prefix matching options 
  you can first  offset  words, and having a variable-prefix rezux to capture the word Tripathi in a group.  
 Then something like this:  
  import re s = "Quick, brown fox jumps on lazy dogs." Print re-search (R: '(?: \ W + \ w *) {(}: ((?: \ W + \ w *) {3}), S). GROUP (1) # Quick, brown Print search again (r? (?: \ W + \ w *) {1} ((?: \ W + \ w *) {3}), s) .group (1) # fast, brown fox Print re-search (R? (?: \ W + \ w *) {2} ((?: \ W + \ w *) {3}), S). Group (1) # brown fox jumper  
  Let's look at the pattern: 
   _ "word" _ _ "word" _ / \ / (?: \ W + \ w *) {2} ( (? :: W + \ w *) {3}) \ _____________ / Group1  
  This does what the person says: match  2  then Capturing Group 1, match  3  word. 
   (?: ...)  Construction is used for grouping for recurrence, but they are non-capturing. 
 < H3> reference      
 
 
 
 
  Focus on the "word" pattern 
  It should be said that  \ w + \ w *  is a bad option for a "word" pattern, as demonstrated by the following example: 
 < Pre>  import again = "nothing" print re-search (r '(\ w + \ w *) {3}', s). Group () #Nothing    There are no 3 words, but regex was able to match anyway, because  \ w *  is empty String allows to match. 
  Perhaps a better pattern is something like this: 
   \ w + (?: \ W + | $)  
  that is ,  \ W +  is done after either  \ W +  or the end of the string  $ . 
 
  Capturing LookHead Option 
  As suggested by cabbage in a comment, this option is simple that you have only one static pattern to catch all the fairs. Uses  findall : 
   import re s = "Quick, brown fox jumps on lazy dogs." Triplets = R.Fundall (R "\ b (? = ((?:: W W + (?: \ W + | $)) {3})", s) Print triples # ['The Quick, Brown', ' Early, brown fox ',' brown fox jumps', # 'fox jumps',' jumping over ',' over lazy ',' lazy dogs. '] Print jumps over three times [3] # fox < Code>   How it works is that it matches the zero-width word limit  \ b , to capture 3 "words" in group 1 LookHead is used 
   ______lookahead______ / ___ "word" __ \ / / \ \ b (? = ((?: \ W + (?: \ W + | $)) {3}) ) \ ___________________ / group1  
  context


















Get link





Facebook





X





Pinterest





Email





Other Apps

Also Add Customs

Search This Blog

python - string mask and offset with regex -

prefix matching options

Focus on the "word" pattern

Capturing LookHead Option

context

Comments

Post a Comment