Image

Imagesophiaserpentia wrote in Imagephp

poorly formed JSON - options?

Hello everyone!

For a project I'm working on, I'm writing a screen scraper. Unfortunately the material I am attempting to mine is poorly-formed JSON (probably parsed with proprietary code). json_decode() returns NULL.

My first thought was that I'd just parse it with regular expressions. Here I've run into a question I can't seem to work out. Much of the data I'm trying to parse looks like

name:'blah blah blah'

So I thought perhaps I could use
/name:\'([^'.*]*)\'/
to grab the stuff between single quotes.

Unfortunately some of the data has escaped single quotes inside, like this:
name:'blah\'s blah blah'
in which case my regex returns only "blah\".

So is there a way to say in regex speak, "stop at the next single quote, unless it is escaped with a slash?" I've not been able to find any reference to a way to write a regex which has an exception to a character negation.

As an alternative, I'm wondering if maybe there's something like Tidy, but for JSON. So far I'm not finding much.

Thanks!

ETA: After much trial and error I found a solution:
name:\'([^']+\\\'[^']+)\'|name:\'([^']*)\'

But I'd still be curious to learn if there's a program like Tidy for JSON.