0

So, I have this as input file, temp.html:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body>
<div id="ext-comp-1725" class="x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window" style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl"><div class="x-window-tr"><div class="x-window-tc"><div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536"><div class="x-tool x-tool-icon x-tool-close"> </div></div>
<span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div></div></div></div>
</body></html>

I was hoping I could pretty-print and indent tags hierarchically by using xmlstarlet:

$ xmlstarlet fo --html --recover --indent-spaces 2 --omit-decl temp.html
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
  <body>
<div id="ext-comp-1725" class="x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window" style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
<div class="x-window-tl"><div class="x-window-tr"><div class="x-window-tc"><div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
<div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536"><div class="x-tool x-tool-icon x-tool-close"> </div></div>
<span class="x-window-header-text" id="ext-gen1541">Hello</span>
</div></div></div></div>
</div></body>
</html>

... however, as it is obvious from the command output above, it only indents some tags (e.g. it split <html><body> and indented those tags properly) - but fails on others (e.g. it kept </div></div></div></div> in a single line).

Is it possible to persuade/set-up xmlstarlet to split off and indent all tags, one tag per line, with proper indentation?

$ xmlstarlet --version
srcinfo-cache
compiled against libxml2 2.9.10, linked with 21209
compiled against libxslt 1.1.34, linked with 10142
1
  • 1
    The last release of this was over 10 years ago. I might encourage you to look at prettier.io as it has a ton of different language support. It's a Node program so you'll need to have node and all of the overhead that comes with too. Commented Dec 3, 2024 at 20:26

2 Answers 2

1

Well, it seems tidy works here (found it via A command-line HTML pretty-printer: Making messy HTML readable):

$ tidy --version
HTML Tidy for Windows version 5.8.0

$ tidy -indent -wrap 160 -ashtml -utf8 temp.html
line 3 column 1 - Warning: missing </div>
line 2 column 7 - Warning: inserting missing 'title' element
Info: Doctype given is "-//W3C//DTD HTML 4.0 Transitional//EN"
Info: Document content looks like HTML 4.01 Strict
Tidy found 2 warnings and 0 errors!

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"
"http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
  <meta name="generator" content="HTML Tidy for HTML5 for Windows version 5.8.0">
  <title></title>
</head>
<body>
  <div id="ext-comp-1725" class=
  "x-window FM-Msg-cls utility-window q-fileExplorer-window q-window show-header-line x-window-noborder x-window-plain x-resizable-pinned q-modal-window"
  style="position: absolute; z-index: 8020; visibility: visible; left: 188px; top: 62px; width: 900px; display: block;">
    <div class="x-window-tl">
      <div class="x-window-tr">
        <div class="x-window-tc">
          <div class="x-window-header x-window-header-noborder x-unselectable x-window-draggable" id="ext-gen1530" style="user-select: none;">
            <div class="x-tool-ct x-tool x-tool-bg" id="ext-gen1536">
              <div class="x-tool x-tool-icon x-tool-close">
                &nbsp;
              </div>
            </div><span class="x-window-header-text" id="ext-gen1541">Hello</span>
          </div>
        </div>
      </div>
    </div>
  </div>
</body>
</html>

About HTML Tidy: https://github.com/htacg/tidy-html5
Bug reports and comments: https://github.com/htacg/tidy-html5/issues
Official mailing list: https://lists.w3.org/Archives/Public/public-htacg/
Latest HTML specification: http://dev.w3.org/html5/spec-author-view/
Validate your HTML documents: http://validator.w3.org/nu/
Lobby your company to join the W3C: http://www.w3.org/Consortium

Do you speak a language other than English, or a different variant of
English? Consider helping us to localize HTML Tidy. For details please see
https://github.com/htacg/tidy-html5/blob/master/README/LOCALIZE.md
Sign up to request clarification or add additional context in comments.

1 Comment

Other than " " was replaced with a space, that seems to work well.
1

First convert the input file to XML (a </div> is missing). By default format uses an indentation of 2 spaces.

xmlstarlet -q format --html --recover --omit-decl temp.html |
xmlstarlet format --omit-decl

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.