Forum Moderators: open

Message Too Old, No Replies

Looking for WYSIWYG with header tags

         

hkinva

5:00 pm on Sep 23, 2020 (gmt 0)

5+ Year Member



Have a large and old site with a lot of crap code for the text in product descriptions. Now settling law suite for WCAG compliance and need <H> tags in that text instead of breaks. Figured we could possibly drop the rendered code in a wysiwyg and use the cleaned up HTML but don't see a way that it would give us header tags. Any ideas would really be welcomed
TIA

not2easy

5:29 pm on Sep 23, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Hi hkinva and welcome to WebmasterWorld [webmasterworld.com]

Generically speaking, you might not want to use header <h tags to replace line breaks, assuming you mean <br> (or <br /> ) between lines. Unless each <br> is between paragraphs of descriptive text anyway - header tags used for anything besides actual section headings is frowned on, judging from past discussions here. Your case may be quite different and I apologize if I'm over-guessing.

It might be better to use <ul> <li> or even <p> tags if each line break is just the next line in a list of specifications. Most text editors offer find/replace functions and good text editors let you limit those replace functions to a particular area of the text, possibly via regex.

tangor

2:09 am on Sep 26, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@hkinva .... Glad to have you join Webmasterworld!

Provide an html sample code of your BREAKS to convert to <h> ... Exemplified as per TOS, of course!

Might get a better answer if we can see what you are wishing to replace/convert!

lucy24

3:48 pm on Sep 26, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If your site design is so old the whole thing is filled with literal line breaks, I don’t think an editor (or CMS, or any one piece of software) is the solution. You need to look at the html--both source and output--with your human eyeballs and figure out exactly what needs changing. Eventually, some changes might be doable globally--say, from
<font face = "Arial">blahblah</font>
to
<span class = "sans">blahblah</span>
but the first step has got to be a close human inspection.

tangor

1:41 am on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Just checking ... we're not talking about horizontal rules, are we?

hkinva

2:01 pm on Sep 27, 2020 (gmt 0)

5+ Year Member



OK my associate has some ideas of software to replace the tags, but I have doubts - looking for an easy solution like pasting everything into a wysiwyg- this is the general setup for most pages - except very old items
so we need the <br/> or <br> tags removed and replaced with <p> tags, strong tags replaced with header tags <H3> and nested <H4> - looks like the <H> tags are not available in any wysiwyg - ideas welcome

<section id="descriptionDetails">
<div class="description">
<span class="detailDescription">Description</span><br /><strong><span style="text-decoration:underline;">MODELXXXtss</span>:</strong> <strong>SKUhere</strong><br /><i>Brand NAMEhere</i><br /><br />Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam lacinia vitae diam non malesuada. Sed in mauris in orci pretium sodales. Mauris id arcu pulvinar, malesuada nunc non, lobortis risus. Curabitur pellentesque tortor vitae lacus sagittis, non vulputate mauris aliquet. Ut pharetra enim in arcu mollis pharetra. Aenean fermentum est in eros interdum, cursus maximus urna sollicitudin. <br /><br /><strong>44444 material</strong><br />Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam lacinia vitae diam non malesuada. Sed in mauris in orci pretium sodales. Mauris id arcu pulvinar, malesuada nunc non, lobortis risus. Curabitur pellentesque tortor vitae lacus sagittis, non vulputate mauris aliquet. Ut pharetra enim in arcu mollis pharetra. Aenean fermentum est in eros interdum, cursus maximus urna sollicitudin. <br /><br /><strong>ANOTHER FEATURE</strong><br />Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nullam lacinia vitae diam non malesuada. Sed in mauris in orci pretium sodales. Mauris id arcu pulvinar, malesuada nunc non, lobortis risus. Curabitur pellentesque tortor vitae lacus sagittis, non vulputate mauris aliquet. Ut pharetra enim in arcu mollis pharetra. Aenean fermentum est in eros interdum, cursus maximus urna sollicitudin.
<div class="clearfix"></div>
</div>
<div class="specifications">
<div class="features"></div>
<div class="specs"><strong><span style="text-decoration:underline;">Specifications</span></strong><ul><li>Length: 3.5"</li><li>another8”</li><li>Overal8"</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>blah blah</li><li>Weight: 3.7 oz.</li><li>Assembled in the USA</li></ul><h3>UPC Code: 7blah blah</h3></div>
<div class="clearfix"></div>
</div>
</section>

not2easy

3:28 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



MOST of this could be done with find/replace in a text editor BUT it will require a manual check for fixing because replacing <br /> with <p> could only be accomplished IF each section had the same original format. I did it in 3 steps with a text editor but there are no <br /> tags used before the clearfix <div> so no</p> tag on the last <p> section before the <div class="clearfix"></div> You could use find
<div class="clearfix">
and replace
</p><div class="clearfix">
for that if they are all alike.

Most text editors used for html will include a utility to find and fix unmatched html tags - so that depends on what editor you would be using.

Notepad++ (Open source / free) for Windows or BBEdit for Mac can do this. I do not know a way to do this in bulk, without manual oversight, although BBEdit can edit entire folders of files at each step.

The use of
<h3>UPC Code: 7blah blah</h3> 
to style the UPC code is not a good practice. The <h tags should not be used solely for styling when they are not actually headers.

lucy24

3:41 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



:: happily salivating at the thought of devising a RegEx that could do most of the change in one fell swoop ::

I must say, that is some very weird-looking html. It looks as if some truly ancient code had been after-the-fact wrapped in a CMS operating on rules of its own. And what's that <h3>UPC Code business buried at the end? I can’t conceive of the UPC code meriting an h-anything, let alone <h3>, unless the whole page is based on sorting by UPC number, which it clearly isn't.

Edit: Looks like not2easy typed faster than me, so expect some overlap.

not2easy

3:51 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



Many years ago I had a (WIN) text editor (iirc it was called PSD) that actually devised RegEx based on Find/Replace entries. That would have been suggested, but afaik it has been gone for over a decade. :(

NickMNS

4:16 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@lucy24
:: happily salivating at the thought of devising a RegEx that could do most of the change in one fell swoop ::

You must be a sadist. That sounds like REGEX hell.

My suggestion, which unfortunately requires the ability to code, would be to use Python, with the package BeautifoulSoup (BS4). It parses the html and then allows you to traverse the DOM. You can select the parent element, for example:
<div class="description">

Then find all the <strong> tags and iterate over them, changing the first to <h2>, the next to <h3> and so on.

Moreover, Python can easily access the files within their folder structure. I have used this to correct errors in static pages for a very large website, i'm talking tens of millions of pages. It took a few hours but it worked. I just ran the script, and came back and it was done.

lucy24

5:48 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You must be a sadist.
Did you mean to say masochist? A sadist would be happily thinking about standing over someone else with a whip, compelling them to devise the RegEx ;)

NickMNS

11:45 pm on Sep 27, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yes Masochist, my bad...

hkinva

8:38 pm on Sep 29, 2020 (gmt 0)

5+ Year Member



using BeautifulSoup - we will get it all fixed. Thanks

tangor

9:57 am on Oct 1, 2020 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good work! Let us know how it turns out ... as in did this result in better results with the search engines!