The use of Apache pig and text
hahahah my brother did not do anything wrong he Cheated in the exam? off course not! I'm trying to match "I did not do anything wrong to my brother".
Ideally, I end up with "my brother" and finally either a punctuation (end of sentence) or EOL.
Looking at pig docs, and then following the link to java.util.regex.Pattern, I think I should be able to use
Extrctd = generate FOREACH FLTR FLATTEN (EXTRACT (txt, '(my brother bus. * \\ p {punct})')) as (txt: chararray); But it seems to match up to the end of the line Any suggestions for the performance of this match? I am ready to take my hair out, and by pulling my hair out I mean switch to python streaming
default Quantifiers in the form of this means that they match as much as possible. In this case you only want to match with the first punctuation. In other words, you want to match as much as possible.
So to solve your problem, immediately after you ? should make the Kunitiffer non greedy:
to my brother *? \\ p {punct} ^ Note that ? Use of differs from its use as a quantifier, where it means 'match zero or one'.
Comments
Post a Comment