[Tidy-dev] White lines in Netcape 6

Discussion:

Frank Visser

2002-03-22 13:23:11 UTC

Hi,

This may be a known problem, but I just stumbled on it.

I have tidied a couple of sites which display well in IE 5.x and Netscape
4.x, but in Netscape 6 there are white horizontal lines around images.

What option in Tidy should I turn off to avoid this?

I thought it was indent, as the documentation says it is better to avoid the
setting "yes" and use "auto" instead, but that's exactly what I used, and I
still get the white lines.

what exactly is the problem here? I thought it was a line break between <img
/> and </td>, but even <img /></td> does not seem to work.

thanks for any help, in particular, can I now fix the pages with Tidy as to
this browser bug?

frank

HUMAN-i
Euro RSCG Interaction
Frank Visser
Project Manager
Snipweg 3
1118 DN Schiphol
The Netherlands
T +31 (0)20 456 53 87
F +31 (0)20 456 51 00
E ***@human-i.com
W www.human-i.com

Karl Ove Hufthammer

2002-03-22 13:28:09 UTC

Permalink

Post by Frank Visser
This may be a known problem, but I just stumbled on it.
I have tidied a couple of sites which display well in IE
5.x and Netscape 4.x, but in Netscape 6 there are white
horizontal lines around images.

Please see:
<URL: http://developer.netscape.com/evangelism/docs/articles/img-table/ >

--
Karl Ove Hufthammer

Frank Visser

2002-03-23 12:07:04 UTC

Permalink

Karl,

Thanks a lot for this tip. That's exactly what i needed.

The key phrase seems to be (for me): "If your document uses transitional
markup, make sure your DOCTYPE reflects that fact and does not have a URI".

Does that mean I can/should use the transitional doctype declaration:

<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/transitional.dtd">
<html xmlns="http://www.w3.org/TR/xhtml1">

... but i have to delete the
"http://www.w3.org/TR/xhtml1/DTD/transitional.dtd" part?

I tried, but the images still have white gaps under them.

Can you specify the doctype I need for me?

thanks in advance,

frank
Message: 3
Subject: Re: [Tidy-dev] White lines in Netcape 6
From: Karl Ove Hufthammer <***@bigfoot.com>
To: tidy-***@lists.sourceforge.net
Date: Fri, 22 Mar 2002 16:27:08 +0100

Please see:
<URL: http://developer.netscape.com/evangelism/docs/articles/img-table/
--
Karl Ove Hufthammer

--__--__--

Message: 4
Date: Fri, 22 Mar 2002 11:14:10 -0500
To: ***@interaccess.com
From: Charles Reitzel <***@rcn.com>
Subject: Re: [Tidy-dev] Tidy for html-xml parser and embedded C++.
Cc: tidy-***@lists.sourceforge.net

Hi Thaddeus,

First, if I read you right, you are asking for a library version of HTML

Tidy. A couple folks have forged ahead with Tidy libraries. See
http://www.lemburg.com/files/python/mxTidy.html and
http://www.dysfunctionals.org/~lee/TidyCPP.zip, also
http://perso.wanadoo.fr/ablavier/TidyCOM/. Any of these will lag
somewhat
behind the current version. For example, I have used TidyCOM with VB
successfully to do bulk Word-To-HTML conversions.

Otherwise, what you want to do _is_ doable, just not easily in C. In a
shell script or, better, Perl it is not a problem. Simply use existing
Tidy options to send all the errors to a file, output to another file
and,
perhaps, various informational messages to the standard output. The
non-informational messages (warnings and errors) are easily parsed with
a
regular expression or even C strstr().

Also, if you _documenting_ C code, you might try placing C source code
within either the <pre> or <code> tags.

Hope this helps and send along any follow up questions you may have.

thanks,
Charlie

Post by Frank Visser
be better off sending it to this group.
One addition to what I've written. I'm doing this on Linux.
First.
Some one I suggested that I send my query to this mailing list.
I haven't been able to find any way to subscribe to this mailing list,
so please either send me the answer directly or show me how
to subscribe.
My problem.
I've written software which crawls through web pages ie given
a web page, I find all the links ( and all the images ) on that web
page. ( The purpose of this is that I get a lot of manuals books etc.
as a tar gzipped set of html documents [eg the Python documentation ].
I then install these on my local web server [ accessible only from my
Lan of which I am the only user ]. I download stuff faster than I can
add a link, so the crawler finds all the files and adds links to files
( I try to be top down--make a best guess of what the index pages are
). Then I find all the links on the links etc.
The main problem I have is parsing the page to find the links.
At first I tried using regular expressions, and it mostly worked.
1) Fragile and there seemed to be multiple expections to the rules
that kept growing.
2) Slow.
So then I used expat to parse the files.
Which was fine for the xml files, but didn't work
for the html files ( of course).
The solution to this was: if expat choked on the file, then change
tidy -asxhtml -m $filename.
Unfortunately tidy chokes on some of the files.
Very few and it looks worthwhile to go on a case by case basis.
The biggest offenders seem to be web pages that contain embedded
C++. For example: vector<T>. Tidy interprets this as a tag <T>.
1) Instead of calling tidy via a system call,
I would like to take the tidy source, remove main and write a
char *tidy(char *buffer,char *error);
Where buffer is the to be parsed file, error is a buffer containing
error messages and tidy returns a xhtml version of the buffer.
2) If this tidy function encounters an error, I would like some way of
being told what character in the buffer the error firsts occurs
memcpy(tidy_buffer,original_buffer, sizeof(file));
tidy(tidy_buffer);
while((int char_pos=error_is_bad_tag())!-0)
{
fix_tag_a_pos(&original_buffer, char_pos)
memcpy(tidy_buffer,original_buffer, sizeof(file));
tidy(tidy_buffer);
}
_______________________________________________
Tidy-develop mailing list
https://lists.sourceforge.net/lists/listinfo/tidy-develop

--__--__--

Message: 5
Date: Fri, 22 Mar 2002 17:33:23 +0100
To: ***@interaccess.com
From: Lee Goddard <***@LeeGoddard.com>
Subject: Re: [Tidy-dev] Tidy for html-xml parser and embedded C++.

Post by Frank Visser

The main problem I have is parsing the page to find the links.

Your best bet is to use Perl; as it was designed for this there are
modules
for exactly this job. For example:

NAME
HTML::LinkExtor - Extract links from an HTML document

SYNOPSIS
require HTML::LinkExtor;
$p = HTML::LinkExtor->new(\&cb, "http://www.perl.org/");
sub cb {
my($tag, %links) = @_;
print "$tag @{[%links]}\n";
}
$p->parse_file("index.html");

DESCRIPTION
*HTML::LinkExtor* is an HTML parser that extracts links from an
HTML
document. The *HTML::LinkExtor* is a subclass of *HTML::Parser*.
This
means that the document should be given to the parser by calling
the
$p->parse() or $p->parse_file() methods.

$p = HTML::LinkExtor->new([$callback[, $base]])
The constructor takes two optional arguments. The first is a
reference to a callback routine. It will be called as links are
found. If a callback is not provided, then links are just
accumulated internally and can be retrieved by calling the
$p->links() method.

The $base argument is an optional base URL used to absolutize
all
URLs found. You need to have the *URI* module installed if you
provide $base.

The callback is called with the lowercase tag name as first
argument, and then all link attributes as separate key/value
pairs.
All non-link attributes are removed.

$p->links
Returns a list of all links found in the document. The returned
values will be anonymous arrays with the follwing elements:

[$tag, $attr => $url1, $attr2 => $url2,...]

The $p->links method will also truncate the internal link list.
This
means that if the method is called twice without any parsing
between
them the second call will return an empty list.

Also note that $p->links will always be empty if a callback
routine
was provided when the *HTML::LinkExtor* was created.

EXAMPLE
This is an example showing how you can extract links from a
document
received using LWP:

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

$url = "http://www.perl.org/"; # for instance
$ua = LWP::UserAgent->new;

# Set up a callback that collect image links
my @imgs = ();
sub callback {
my($tag, %attr) = @_;
return if $tag ne 'img'; # we only look closer at <img ...>
push(@imgs, values %attr);
}

# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});

# Expand all image URLs to absolute ones
my $base = $res->base;
@imgs = map { $_ = url($_, $base)->abs; } @imgs;

# Print them out
print join("\n", @imgs), "\n";

SEE ALSO
the HTML::Parser manpage, the HTML::Tagset manpage, the LWP
manpage, the
URI::URL manpage

COPYRIGHT
Copyright 1996-2001 Gisle Aas.

This library is free software; you can redistribute it and/or
modify it
under the same terms as Perl itself.

Tool completed successfully

hth
lee

--__--__--

_______________________________________________
Tidy-develop mailing list
Tidy-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/tidy-develop

End of Tidy-develop Digest

Karl Ove Hufthammer

2002-03-23 12:23:01 UTC

Permalink

Post by Frank Visser
Thanks a lot for this tip. That's exactly what i needed.
The key phrase seems to be (for me): "If your document uses
transitional markup, make sure your DOCTYPE reflects that
fact and does not have a URI".
Does that mean I can/should use the transitional doctype
<?xml version="1.0"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0

[...]

No. XHTML documents are *always* rendered in standards mode
(i.e. correctly). If you want to use quirks (bugwards
compatible) mode, you have to use HTML 4.01 (or earlier), *not*
XHTML.

Post by Frank Visser
I tried, but the images still have white gaps under them.
Can you specify the doctype I need for me?

Use
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">

and ensure your document is normal HTML 4.01, not XHTML. But
IMO a better solution is to specify style rules so that your
document is displayed the way you want, even if 'standards
mode' rendering is used.

--
Karl Ove Hufthammer

Frank Visser

2002-03-23 17:48:01 UTC

Permalink

Hi Karl,

Thanks again for clearing up the confusion. Forgive me for continuing this
discussion, but i have spent today going over the NS6 issue with
tables/images/spacers, and now it dawns on me that this issue has not begun
with XHTML 1.0 at all, but with HTML 4.0 - I wasn't aware of this when i
started off this thread.

For HTML 4.0 the advice is given:
- either declare use of a transitional DTD, but do not include a URL (Meyer
on Netscape's own DevEdge),
- or don't use a DOCTYPE at all (Eric Meijer on O'Reilly site)

It all boils down to a Netscape 6 "bug" (it even has a number: #42525, see
http://bugzilla.mozilla.org/) or a "purist interpretation of CSS2"
(according to jaylard at evolt.org), but the upshot is: i cannot use XHTML
(transitional) code simply because Netscape 6 refuses to recognize it as
such (i.e. mixed XHTML and presentational tags), and insists on rendering it
as strict code!?

Where is the logic behind all this? It simply does not make sense to me.
Isn't the essence of a "transitional" page that it does NOT conform strictly
to the rules of future coding?

Now for XHTML you seem to say, leaving out the URL does not help, for XHTML
is always renderend standard (i.e. strict), instead of quirky (i.e. loose).

Instead I should just stick to HTML 4.0 coding, and leave it at that?

What about leaving out the DOCTYPE declaration at all, for the moment (as
Meyer seems to advise)? That seems to fix my problems, and I can still have
clean XHTML on my pages. Does that do any harm, even as it may serve no
purpose either?

The reason I took the trouble to upgrade my pages to XHTML is that it
teaches my programmers to code more rigorously, and the content of the pages
can be processed in XML based publishing systems. I assume I can still
continue down that road?

Best,

frank

-----Original Message-----
From: Karl Ove Hufthammer
To: tidy-***@lists.sourceforge.net
Sent: 03/23/2002 03:21 PM
Subject: Re: [Tidy-dev] White lines in Netcape 6

[...]

No. XHTML documents are *always* rendered in standards mode
(i.e. correctly). If you want to use quirks (bugwards
compatible) mode, you have to use HTML 4.01 (or earlier), *not*
XHTML.

Post by Frank Visser
I tried, but the images still have white gaps under them.
Can you specify the doctype I need for me?

--
Karl Ove Hufthammer

Karl Ove Hufthammer

2002-03-23 19:55:01 UTC

Permalink

Post by Frank Visser
Thanks again for clearing up the confusion. Forgive me for
continuing this discussion, but i have spent today going
over the NS6 issue with tables/images/spacers, and now it
dawns on me that this issue has not begun with XHTML 1.0 at
all, but with HTML 4.0

That's correct.

Post by Frank Visser
- either declare use of a transitional DTD, but do not
include a URL (Meyer on Netscape's own DevEdge),

Yes.

Post by Frank Visser
- or don't use a DOCTYPE at all (Eric Meijer on O'Reilly
site)

In this case, your document would be HTML at all, but
so-called 'tagsoup'.

Post by Frank Visser
It all boils down to a Netscape 6 "bug" (it even has a
number: #42525, see http://bugzilla.mozilla.org/

No, this bug was fixed almost one and a half year ago, and the
bug report is closed.

) or a

Post by Frank Visser
"purist interpretation of CSS2" (according to jaylard at
evolt.org), but the upshot is: i cannot use XHTML
(transitional)

Yes, you can, *if* you write in a way the makes it display like
you want in Netscape. This is not impossible!

Post by Frank Visser
code simply because Netscape 6 refuses to
recognize it as such

No, Netscape correctly parses and presents it as a (X)HTML
Transitional document.

Post by Frank Visser
(i.e. mixed XHTML and presentational
tags), and insists on rendering it as strict code!?

No, it renders it as a Transitional document, but it renders it
in 'standard mode', i.e. correctly as by the CSS specs. The
names 'Strict' and 'Transitional' are only names given to
various 'versions' of HTML 4.01 and XHTML 1.0.

Post by Frank Visser
Where is the logic behind all this?

The logic is this:

There are billions of pages with broken CSS (and HTML) out
there. These were rendered incorrectly by most older browsers,
but they were rendered *the way the Web designers wanted* (the
designers naturally assumed the browsers rendered their
document correctly). If newer browsers suddenly started to
render documents in standards-compliant mode, all these
billions of pages would break.

One solution would be to *never* support the CSS 2 standard.
Obviously a bad solution! Another solution is to only render
documents in standard-compliant mode if they are using a full
DOCTYPE declaration. The assumption is that the authors
who write pages with such a DOCTYPE, knows the various Web
standards, wishes to follow them, and wants his/her documents
to be rendered correctly (a very reasonable assumption, I might
add).

Post by Frank Visser
It simply does not make
sense to me. Isn't the essence of a "transitional" page
that it does NOT conform strictly to the rules of future
coding?

No.

Post by Frank Visser
Now for XHTML you seem to say, leaving out the URL does not
help, for XHTML is always renderend standard (i.e. strict),
instead of quirky (i.e. loose).

Yes. If you write XHTML documents, you obviously want to follow
the standards, and then there's no need for buggy CSS
rendering.

Post by Frank Visser
Instead I should just stick to HTML 4.0 coding, and leave
it at that?

Yes. Or change your style sheet so that the document is
displayed *even* in standards compatible mode (luckily, it will
probably still be rendered the way you want it to in older
browsers).

Post by Frank Visser
What about leaving out the DOCTYPE declaration at all, for
the moment (as Meyer seems to advise)? That seems to fix my
problems, and I can still have clean XHTML on my pages.

No, a DOCTYPE declaration is *required* for all XHTML (and HTML
4.x) documents. I advise you to run you documents through
<URL: http://validator.w3.org/ >.

Also note that other browsers, such as IE 6, uses the DOCTYPE
to choose if standards-compliant or 'compatible' rendering
should be used. You can find more information on 'DOCTYPE
switching' at <URL:
http://gutfeldt.ch/matthias/articles/doctypeswitch.html >.

--
Karl Ove Hufthammer

Charles Reitzel

2002-03-24 15:48:09 UTC

Permalink

There is another, much simpler explanation: the CSS standards are
broken. Perhaps CSS should be adjusted to comply with existing
implementations - especially insofar as they are interoperable.

<flame suit on>

Don't get me wrong. Stylesheets are great. I have been a major promoter
of both CSS and XSLT stylesheets. I don't believe in standards getting too
far ahead of actual implementations, that's all. For example, there are no
full CSS2 implementations yet after 3 years. Does anyone have a good feel
for what might be an inter-operable subset of CSS2? Are there folks
successfully using CSS2 to re-purpose content to multiple media types:
desktop, print, PDA? IMHO, CSS1 has only just arrived.

take it easy,
Charlie

Post by Karl Ove Hufthammer

Post by Frank Visser
Where is the logic behind all this?

There are billions of pages with broken CSS (and HTML) out
there. These were rendered incorrectly by most older browsers,
but they were rendered *the way the Web designers wanted* (the
designers naturally assumed the browsers rendered their
document correctly). If newer browsers suddenly started to
render documents in standards-compliant mode, all these
billions of pages would break.
One solution would be to *never* support the CSS 2 standard.
Obviously a bad solution! Another solution is to only render
documents in standard-compliant mode if they are using a full
DOCTYPE declaration. The assumption is that the authors
who write pages with such a DOCTYPE, knows the various Web
standards, wishes to follow them, and wants his/her documents
to be rendered correctly (a very reasonable assumption, I might
add).

Dave Raggett

2002-03-25 10:03:03 UTC

Permalink

Post by Charles Reitzel
There is another, much simpler explanation: the CSS standards are
broken. Perhaps CSS should be adjusted to comply with existing
implementations - especially insofar as they are interoperable.
<flame suit on>
Don't get me wrong. Stylesheets are great. I have been a major promoter
of both CSS and XSLT stylesheets. I don't believe in standards getting too
far ahead of actual implementations, that's all. For example, there are no
full CSS2 implementations yet after 3 years. Does anyone have a good feel
for what might be an inter-operable subset of CSS2? Are there folks
desktop, print, PDA? IMHO, CSS1 has only just arrived.

FYI - W3C is considering an update to the CSS2 spec to apply the current
requirements for multiple implementations of W3C specs. The W3C Process
has evolved to add a Candidate Recommendation (CR) phase since CSS2 was
done. The update to CSS2 will be followed by a modularized version of
CSS, named CSS3, which of course will have to go through the CR process.
For more information contact the W3C style activity lead, Bert Bos
<***@w3.org>

--
Dave Raggett <***@openwave.com> or <***@w3.org>
W3C Visiting Fellow, see http://www.w3.org/People/Raggett
tel/fax: +44 1225 866240 (or 867351) +44 771 213 7629 (GSM)

Frank Visser

2002-03-25 07:04:06 UTC

Permalink

karl,

but shouldn't people who use Tidy be warned explicitly that (1) you cannot
just tidy your code with it but (2) you should also add stylesheet fixes for
images if you don't want to run into trouble with Netscape 6 (and IE 6)?

So one has to go all the way, and can't just stop somewhere in between?

frank

HUMAN-i
Euro RSCG Interaction
Frank Visser
Project Manager
Snipweg 3
1118 DN Schiphol
The Netherlands
T +31 (0)20 456 53 87
F +31 (0)20 456 51 00
E ***@human-i.com
W www.human-i.com

-----Original Message-----
From: Karl Ove Hufthammer [mailto:***@bigfoot.com]
Sent: Saturday, March 23, 2002 10:54 PM
To: tidy-***@lists.sourceforge.net
Subject: Re: [Tidy-dev] White lines in Netcape 6