Non Overlapping Pattern Matching With Gap Constraint In Python
Solution 1:
This can be solved elegantly with regex. We just have to convert the pattern
into a regex and then count how often that regex matches in the input sequence.
For example, given the input pattern = 'A B C'
and max_gap = 2
, we want to create regex like
A(arbitrary_word){,2}?B(arbitrary_word){,2}?C
Matching arbitrary words separated by spaces can be done with (?:\S+\s+)
, so we get:
import re
defcount_matches(pattern, seq, max_gap):
parts = map(re.escape, pattern.split())
sep = r'\s+(?:\S+\s+){{,{}}}?'.format(max_gap)
regex = r'\b{}\b'.format(sep.join(parts))
returnsum(1for _ in re.finditer(regex, seq))
Test runs:
count_matches('2982f 2982l 2981l', '2982f 2982f 2982l 2982l 2981l', 2)
# result: 1
count_matches('A B C', 'A B D E C B A B A B C', 2)
# result: 2
Solution 2:
I think you've done the hard part. Now just loop through the indices of the first word looking for indexes of the second word that are less than the gap, and so on. Then go back to the indices of the first word and if you found a match last time, skip any indices that fall into that match.
For example, here's the solution with only two words in pt:
i=0while i < len(pt_dic[pt_split[0]]):
ii = pt_dic[pt_split[0]][i]
#print "ii=" + str(ii)
j=0while j < len(pt_dic[pt_split[1]]):
jj = pt_dic[pt_split[1]][j]
#print "jj=" + str(jj)if jj > ii and jj <= ii + 2:
print"Match: (" + str(ii) + "," + str(jj) + ")"# Now that we've found a match, skip indices within that match.
i = next(x[0] for x inenumerate(pt_dic[pt_split[0]]) if x[1] > jj)
i -= 1# counteract the increment at the end of the outer loopbreak
j += 1
i += 1#print "i=" + str(i)
And with three words in pt:
i=0while i < len(pt_dic[pt_split[0]]):
match=False
ii = pt_dic[pt_split[0]][i]
#print "ii=" + str(ii)# Start loop at next index after ii
j = next(x[0] for x inenumerate(pt_dic[pt_split[1]]) if x[1] > ii)
while j < len(pt_dic[pt_split[1]]) andnot match:
jj = pt_dic[pt_split[1]][j]
#print "jj=" + str(jj)if jj > ii and jj <= ii + 2:
# Start loop at next index after jj
k = next(x[0] for x inenumerate(pt_dic[pt_split[2]]) if x[1] > jj)
while k < len(pt_dic[pt_split[2]]) andnot match:
kk = pt_dic[pt_split[2]][k]
#print "kk=" + str(kk)if kk > jj and kk <= jj + 2:
print"Match: (" + str(ii) + "," + str(jj) + "," + str(kk) + ")"# Now that we've found a match, skip indices within that match.
i = next(x[0] for x inenumerate(pt_dic[pt_split[0]]) if x[1] > kk)
i -= 1# counteract the increment at the end of the outer loop
match=True
k += 1
j += 1
i += 1#print "i=" + str(i)
And with four words in pt:
i=0while i < len(pt_dic[pt_split[0]]):
match=False
ii = pt_dic[pt_split[0]][i]
#print "ii=" + str(ii)# Start loop at next index after ii
j = next(x[0] for x inenumerate(pt_dic[pt_split[1]]) if x[1] > ii)
while j < len(pt_dic[pt_split[1]]) andnot match:
jj = pt_dic[pt_split[1]][j]
#print "jj=" + str(jj)if jj > ii and jj <= ii + 2:
# Start loop at next index after ii
k = next(x[0] for x inenumerate(pt_dic[pt_split[2]]) if x[1] > jj)
while k < len(pt_dic[pt_split[2]]) andnot match:
kk = pt_dic[pt_split[2]][k]
#print "kk=" + str(kk)if kk > jj and kk <= jj + 2:
# Start loop at next index after kk
l = next(x[0] for x inenumerate(pt_dic[pt_split[3]]) if x[1] > kk)
while l < len(pt_dic[pt_split[2]]) andnot match:
ll = pt_dic[pt_split[3]][l]
#print "ll=" + str(ll)if ll > kk and ll <= kk + 2:
print"Match: (" + str(ii) + "," + str(jj) + "," + str(kk) + "," + str(ll) + ")"# Now that we've found a match, skip indices within that match.
i = next(x[0] for x inenumerate(pt_dic[pt_split[0]]) if x[1] > ll)
i -= 1
match=True
l += 1
k += 1
j += 1
i += 1#print "i=" + str(i)
I think the pattern has been established now, so generalisation to an arbitrary number of words left as exercise for reader!
Post a Comment for "Non Overlapping Pattern Matching With Gap Constraint In Python"