Sorry, I know that this is probably a duplicate but searched for 'Python Regular Expression Match', I did not find anything that answered my question!
The document (which is a long HTML page, to clarify) I'm looking for contains a whole bunch of strings (inside a JavaScript function) that looks like this:
link: '/ hidden / side-side green / DI1 = 1204970159862'}; Link: '/ hidden / side-sidelela / dai 1 = 1204970159862'};
I want to remove the link (i.e. everything in the middle of the quote within the string) - eg. To get the link, I know that I have to get started:
Re.matchall (regexp), doc_sting)
but should be regexp
?
The answer to your question depends on how the rest string looks. If they are all the same then it is like this: link: '& lt; URL & gt; '};
, you can very simple by using simple string manipulation:
migration = "link:" / hidden / sidebidden green / DI 1 = 1204970159862 '}; " Print (Mystring [7: -3])
(If you have a string that has several lines, you can split the string into line.)
< P> If it is a bit more complicated then it is ok to use regular expression. An example that looks at the URL inside the quote: myDoc = "" "link: '/ hidden / sidebySideGreen / dei1 = 1204970159862'}; link: '/ hide / sidebySideYellow / dei1 = Based on how the entire string looks, you'll see the link (120) (120) (120 KB) ("('[^'] +) ',' '(" [[' '^' + '' ',' myDoc \ '), "1204970159862'}; :
can also be included as: print (re.findall ("link: '([^'] + ', myDoc))
Comments
Post a Comment