                        _____________________________
                                WRITING HTML
                        
                          A "quick and dirty" guide
                         to creating a HTML document
                            by:  Mrten Lindstrm


 HTML really IS very  simple,  which  may  not  be  immediately evident when
 looking at the HTML specification  documents.  In  this presentation I will
 skip over most of the  theory  and  furthermore  limit  it to (most of the)
 HTML 2.0 features.

 If there is an interest for  it  I  could  perhaps write a further article,
 more in-depth with full coverage of both HTML 2.0 and 3.2.



                      HOW TO TURN PLAIN TEXT INTO HTML
                      

 Just take an existing plain ("ascii") text and do the following:


 1)  Replace all occurrences of             &   with   &amp;
     including the semicolon.


 2)  Now, in the same way, replace every    <   with   &lt;
     and perhaps, to be sure, also          >   with   &gt;
     the second line is unnecessary, so it can be removed.


 3)  Insert a "TITLE" (to be  used  for  window  caption  by browser) at the
     start of the text:

         <TITLE> Some text for caption of document window </TITLE>

     This also is what a browser will use  to determine that the text at all
     IS a HTML document.


 4)  Insert  <P>  before each and every PARAGRAPH of your text.
     Remember that the  browser  will  IGNORE  ALL  NEWLINES  in your source
     (instead formatting the text according to the current window width) and
     will split your text into paragraphs based solely on these <P> tags.


 5)  If you have used any Atari-specific  characters (the ones in the second
     half of the character  set  -  including  British  pound sign and "non-
     English" letters) then you must also convert these into the "ANSI" (aka
     ISO 8859-1 aka Latin-1) character  set.  For instance using my ANSIFIER
     program in Ictari 39.


 Done!

 (Now inspect your text with CAB or HTML-Browser!)

 Any newlines and extra spaces (above one between each word) will be ignored
 by the browser, so you are free to  insert  as many as you like, to improve
 the readability of the plain source text.

 Note on start tags and end  tags:  Most  types of elements, like the title,
 need BOTH start and  end  tag  (<title>  and  </title>)  while  a few, like
 paragraphs, don't. (It is enough  to  start  each paragraph with <p> though
 you optionally also COULD end it with </p>.)
 There are even some elements  that  NEVER  have  an end tag, simply because
 they don't contain any document text - see <HR> and <IMG> below.

 Furthermore, in clean HTML, elements can be contained within each other but
 should never overlap. For instance, in  order  to use both bold AND italics
 style on some text you could write:

     <b><i>some text</i></b>                   An i element cleanly within b
 or  <i><b>some text</b></i>                    A b element cleanly within i

 but the following versions, on the other hand
     <b><i>some text</b></i>
 and <i><b>some text</i></b>
 are not clean HTML, although most browsers might understand them anyway.



                                 REFINEMENTS
                                 

 Pre-formatted text
 
 Instead of the <P> tags, preceding  every  paragraph, you could have merely
 preceded the whole text - after the  title  element  - with a <PRE> tag and
 succeeded it with  </PRE>.  This  would  suppress  automatic word-wrapping,
 causing the browser to preserve all spaces and newlines literally and use a
 monospaced font for the text.  I.e.  behaving essentially like the familiar
 old ascii text viewer.

 More typically, you would use  <PRE>  and  </PRE> tags only around selected
 parts of the text, such as program listings.


 Headings
 
 To turn a paragraph into  a  heading,  just  remove  the  <P> before it and
 instead enclose it in <H1> and </H1>, thus:

     <H1> Some heading in your document </H1>

 With H2 instead of H1 you will get a smaller heading, H3 results in an even
 smaller heading, down to H6 for the smallest possible heading.
 (A recommendation is to not skip heading  levels,  i.e. after a H1 don't go
 down to H3 before you have used H2.)


 Lists
 
 A list, bulleted or numbered, can be written thus:

     <UL>                              In browser this will appear as
       <LI> Text for first item            Text for first item
       <LI> second item                    second item
       <LI> third ... etc.                 third ... etc.
     </UL>

 or
     <OL>                              In browser this will appear as
       <LI> Text for first item          1.  Text for first item
       <LI> second item                  2.  second item
       <LI> third ... etc.               3.  third ... etc.
     </OL>

 UL stands for Unordered (i.e. bulleted) List,
 OL stands for Ordered (i.e. numbered) List.

 Each LI (List Item) element could  also contain multiple paragraphs or even
 sub-lists (but not headings).

 The indentation I have used on  the  <LI>  elements is of course purely for
 readability of the source text  and  won't  affect how the browser displays
 them (they will typically be displayed indented anyway).


 Horizontal Rules
 
 Just insert <HR> where you want a  horizontal division line in the text. In
 monochrome it will simply  be  a  thin  black  line,  while  in colour most
 browsers make it appear as  a  three-dimensional  groove (achieved by using
 two colours for it: top= dark gray or black, bottom = white; the text back-
 ground being not white but light gray).


 Images
 
 An image can be inserted anywhere in the text flow with:

     <IMG SRC="SOMEPATH/FILENAME.GIF">

 For really GOOD HTML, you should  also  add,  within  the IMG tag, an extra
 attribute:   ALT="Text that is displayed if image not shown".
 Note: ALT="" is entirely appropriate to use with pure adornment images. The
 ALT text should _NOT_ be a picture DESCRIPTION but an ALTERNATIVE.


 GIF is the  most  widely  recognized  picture  file  format,  while JPEG is
 understood by most newer browsers (this  SHOULD include CAB (?)). Only now,
 in October this year (96), was PNG  formally adopted by W3C (the World Wide
 Web Consortium), but it will probably replace GIF eventually.


 Hyperlinks
 
 Any image or piece of text could also be made into a hyperlink by enclosing
 it in <A HREF="SOMEPATH/SOMEFILE"> and </A>
 For instance:

     <A HREF="SUBDOC.HTM">This is a clickable link</A>

 A link doesn't necessarily have  to  lead  to  another HTML file. You could
 make links to ANY kind  of  file,  though  the  browser  may not be able to
 display it, of course. Plain ("ascii")  text  files as well as (GIF) images
 normally ARE displayed directly by the browser, others may be passed by the
 browser  to  some  other  program  (if   a   protocol  for  this  has  been
 established).

 Note: When CAB displays a plain  text  file it treats characters 160-255 as
 ANSI (like in a HTML file) rather  than  Atari. Not the ideal behaviour for
 an Atari browser I would say.
                               -----

 More generally, an A element  ("Anchor")  can  be  jumped both TO and FROM,
 making it possible to jump not  only  between  different files but within a
 HTML file. For an anchor to  serve  as  a  starting point, a HREF attribute
 must be present, as above; An anchor  serving  as a destination must have a
 NAME attribute, for instance:

     <H2><A NAME="CONCL"> Conclusions </A></H2>

 To enable a link to this anchor,  some  other anchor, in the same document,
 could be written as:

     <A HREF="#CONCL"> See conclusions </A>

 Note the '#' character. Links could  also  be  made from other documents by
 preceding the '#' with a relative pathname, e.g.:

     <A HREF="END.HTM#CONCL"> See conclusions </A>


 Text Styles in HTML
 
 Enclosing text with <B> and </B> will render it in BOLD type; similarly <I>
 and </I> for italics and <TT> and </TT> for a monospaced (TeleType) font.

 However, this TYPOGRAPHIC markup is slightly  out  of line with the rest of
 HTML (at least until anomalies like the  FONT  element of HTML 3 appeared -
 that should become obsolete with  the  expected  addition of STYLE SHEETS).
 HTML mainly tries to concentrate on the LOGICAL purpose of the text. And so
 there is an alternative logical or "idiomatic" markup system:

     <EM> ... </EM>      for emphasized                          italics
 <STRONG> ... </STRONG>  for strongly emphasized                 bold

   <CITE> ... </CITE>    for book titles etc.                    italics

    <VAR> ... </VAR>     for variables (in syntax descriptions)  italics
   <CODE> ... </CODE>    for some code element                   monospaced
    <KBD> ... </KBD>     for text typed by user (in eg manuals)  monospaced
   <SAMP> ... </SAMP>    for some sample of literal characters   monospaced

 In the last column  I  have  listed  how  browsers  typically display these
 elements, and, of course, the styles overlap  both with each other and with
 the typographic markup. So what's the point of all this?

 Answers:

  1) Logical markup allows the browsing software  (and human if the software
     allows it) to CHOOSE how each  element  type  is  to be displayed - for
     instance using different colours.

  2) It may simplify automatic processing of  the  text, e.g. by indexers or
     text analysers.

 Still, it should be said that many  or  most people will probably never use
 anything but the typographic markup, because  it reminds them of the secure
 and  old-fashioned  word-processor  they  are   used   to  (plus  that  the
 typographic tags admittedly are a little shorter than the idiomatic ones).
 Typographic markup is of course also what programs automatically converting
 from word-processor file formats will always have to use.


 Comments
 
 COMMENTS (ignored by  the  browser)  can  be  inserted  anywhere  in a HTML
 document enclosed in  <!--  and  -->  For example:

   <!-- This text will look better if viewed with a HTML browser -->



                         PATHS FOR IMAGES AND LINKS
                         

 Just about any familiar old relative DOS path and filename is acceptable in
 the <IMG> and <A> tags, EXCEPT that  FORWARD slash characters (/) should be
 used instead of the DOS  backslashes  (\)  for separator. Browsers on Atari
 and PC will probably understand  backslashes  too,  but e.g. a Unix browser
 may be more pleased to see forward slashes.

 You should probably also try to  use  UPPERCASE letters only, for your path
 and file names, since this is how names  on files and folders are stored by
 (GEM)DOS. Even though TOS/DOS/Windows are case insensitive, Unix isn't.

 Above remarks are for the event  that  you  transfer  your files to Unix or
 something, plus you might as well  learn  proper HTML (actually URLs = "Web
 paths") from the start.

 It is quite OK to use even the  familiar old DOS double-dot ".." for moving
 up one folder level. For instance  ../INDEX.HTML

 Paths are counted from where the current HTML document is located.



                       PROPOSITION FOR ICTARI ARTICLES
                       

 May I here make  the  suggestion  that  pictures  and  sub-documents of any
 Ictari article in HTML format be normally  placed in a folder with the name
 of the HTML article but with the extension .SUB (or .PIX) instead of .HTM

 For instance a document
     ARTICLE.HTM
 would have its sub-documents (and pictures (?)) in the folder
     ARTICLE.SUB

 This would tidy the  disk  directories  so  that  the  main .HTM file would
 always easily be found.  And,  regardless  of  how  many  pictures and sub-
 documents referred to by it, there would in  most cases only be two items -
 the HTM file and a SUB folder - to deal with during disk operations such as
 move or copy.

 In order to convert an  existing  HTML  document  into this format you will
 need to search it for occurrences  of  the  SRC attribute (in IMG tags) and
 HREF attribute (in A tags) and change the given paths appropriately.

