Fixing HTML
Problem:
I have a site that displays blog/journal entries. The site allows trusted technical people to submit blog entries, thus allows unrestricted HTML entry. However, I need to process the HTML entered slightly to provide a correct XHTML sytnax.
1. Replace img tags with links to the image labeled correctly
2. Replace blockquote with a span
3. Clean up br, hr and other single tag elements
There may be more.
Cleaningup br hr and blockquote has been no problem, but I'm having problems with my links and images. First thing I tried was to use a preg_replace_callback function to reformat, but, my regexp skills are very lacking.
Basicaly, if the tag is:
<img src="url"<
I want to display [img] where img is a clickable link to url.
If the tag is:
<img src="url" alt="alt">
I want to display [alt] where alt is a clickable link to url and alt is the alt text.
And finaly, if the link is:
<a href="url"><img src="url2" alt="alt"></a>
Then I want to display [alt] (or [img] if there is no alt) as a clickable link to url not url2. My first few stabs were:
However, the result is that the body between the < and > is matched but the tags aren't passed through, so I get <[img]>.
Any help/suggestions appreciated.
I have a site that displays blog/journal entries. The site allows trusted technical people to submit blog entries, thus allows unrestricted HTML entry. However, I need to process the HTML entered slightly to provide a correct XHTML sytnax.
1. Replace img tags with links to the image labeled correctly
2. Replace blockquote with a span
3. Clean up br, hr and other single tag elements
There may be more.
Cleaningup br hr and blockquote has been no problem, but I'm having problems with my links and images. First thing I tried was to use a preg_replace_callback function to reformat, but, my regexp skills are very lacking.
Basicaly, if the tag is:
<img src="url"<
I want to display [img] where img is a clickable link to url.
If the tag is:
<img src="url" alt="alt">
I want to display [alt] where alt is a clickable link to url and alt is the alt text.
And finaly, if the link is:
<a href="url"><img src="url2" alt="alt"></a>
Then I want to display [alt] (or [img] if there is no alt) as a clickable link to url not url2. My first few stabs were:
preg_replace_callback( '<img src="(.*)">', 'imagecleaner', $postbody );
However, the result is that the body between the < and > is matched but the tags aren't passed through, so I get <[img]>.
Any help/suggestions appreciated.
