Is there any device to do this that compares in Python?
For example, if I have http://google.com and google.com / , then I should know that they might Be the only one site.
If I had to manually create a rule, then I can uppercase it, then I can strip the http: // part, and the last alpha I can drop anything after Numeric character .. But I can see the failures of this, because I am sure you too can do it.
Is there any such library that does this? How are you going to do it?
From the top of my head:
def canonical_url (u ): U = u.lower () if u.startswith ("http: //"): u = u [7:] if u.startswith ("www."): U = u [4:] If u Endswith ("/"): u = u [: - 1] back urf same_urls (u1, u2): return canonical_url (u1) == canonical_url (u2) Obviously, its There are many rooms with more negligible. Regexes can be better than startwith and endwith, but you get this idea.
Comments
Post a Comment