
                THE GIF AND TIFF PICTURE FILE FORMATS
                        BY:  Mrten Lindstrm
                -------------------------------------

     Some time after I had  written  the  IMG and IFF ILBM description
(Ictari 16) I received the official  TIFF and GIF documentation files.
After digesting these and making some  experiments  of my own, I wrote
this file. Although, the  full  official documentation should probably
be available  to  all  Ictari  members  (The  TIFF  documentation  was
actually published in the issue 16  mentioned above, the GIF documents
I send to Ictari with this  text)  it  is  my  hope that there will be
still some  value  in  a  shorter  description  written  by  an  Atari
programmer for other Atari programmers to read.

     Included is also - yet  another  - description of LZW un/packing,
which I hope to be easier  to understand than the previously published
ones (at least it's different), and  also  closer,  I think, to how it
would actually be programmed in an effective way. (See next month).


                   General notes about GIF and TIFF
                   --------------------------------

GIF  (Graphics  Interchange  Format)   was   designed  by  CompuServe,
primarily intended for telecommunication  uses.  It can handle palette
colour  images  only  (max.  256   colours),   which  are  always  LZW
compressed.

TIFF (Tag Image  File  Format)  was  made  up  by  Aldus and Microsoft
primarily for DTP, and originally  couldn't  handle colour images. Its
design was however from the start very  flexible  (a bit like IFF - my
personal favourite), and it was  soon  extended  to handle any type of
bitmap image, thereby adopting the LZW scheme of GIF. (There now or in
the future will exist JPEG compressed  TIFF  images as well, but about
this I know nothing.)


From an Atari user's point of view, both GIF and TIFF use a format for
the image data that may  make  them  less  attractive  than IMG or IFF
ILBM, requiring a further - time and space demanding - conversion step
in addition to the de/compression.  LZW  itself  is in addition not as
lightning fast as the compression schemes used in IFF ILBM and IMG. On
the other hand LZW isn't exactly  slow  either, even on a bog standard
ST, and  although  a  certain  delay  (a  good  second  or  a  few) is
unavoidable during  unpacking  and  conversion,  this  should  be less
noticeable when using a floppy disk  where  the file loading itself is
probably the most time consuming  step.  For  a floppy user a slightly
shorter loading time, due to the effectiveness of LZW, could even make
up for part of the unpacking  time.  (And  maybe the lesser demands on
disk space make up  for  the  extra  demands  on  internal memory work
space?) GIF and TIFF are  admittedly  also  more common formats in the
general (=PC) computer world.

Regarding the LZW effectiveness I  made  some - limited - experiments,
where LZW in all cases  beat  the  compression  schemes of IMG and IFF
ILBM. In some  cases  the  best  of  the  latter  -  the vertical word
compression of (DeluxePaint) IFF ILBM (and  Tiny) - came pretty close.
But with other pictures (simple maps and charts not making full use of
the available range of colours) the LZW victory was devastating.


So before describing the file formats  let's  have a look at the image
data themselves.



                The uncompressed image in GIF and TIFF
                --------------------------------------

A non-compressed monochrome image will be stored exactly the same in a
TIFF file as it would in the Atari environment, except that the latter
usually requires pixel rows to begin on word boundaries, while in TIFF
they only have to begin on byte boundaries.

In GIF each pixel, in  a  non-compressed  image,  takes up a full byte
(with 7 leading zero-bits in the case  of a mono image). This may seem
a terrible waste but will not  result  in bigger files since the image
data in these are always compressed.


The real difference comes with colour palette images, which are in the
Atari environment bitplane separated or interleaved.

Not so in GIF or TIFF.  Instead  the  complete data for each pixel are
stored in consecutive bits. Again, these  in  GIF are always allowed a
full byte per pixel,  while  in  TIFF  smaller  than  8-bit values are
packed into bytes (as tightly as  possible "left to right", i.e. first
using the most significant bits of each  byte, and with no unused bits
except at the end of a line). Every line begins on a byte boundary.


Example: A pixel row of a  3  pixel  wide 16-colour image could in the
Atari environment look like:

   bitplane 0       bitplane 1       bitplane 2       bitplane 3
 % 001------------- 011------------- 101------------- 111-------------
pix012              012              012              012

(where the colour number for each pixel is formed by the corresponding
bits of each plane, beginning with bitplane 3 and ending with bit 0.)


In GIF this would (non-compressed) look like:

  bit 76543210 76543210 76543210
    % 00001100 00001010 00001111
pixel     0        1        2

And in TIFF:

  bit 76543210 76543210
    % 11001010 1111----
pixel 0   1    2   fill


In TIFF RGB images the RGB values for each pixel are by default stored
as three consecutive values (like the Falcon High Colour screen memory
is organized), but  can  also  be  stored  in  three  separate "sample
planes" (see PlanarConfiguration in the TIFF description.)


In GIF no values greater than 8-bit have been foreseen, while the TIFF
definition states that in  such  an  unlikely  case,  values are to be
packed into  words  (or  longs  if  >16-bit)  rather  than  bytes. The
significance of this is  that  each  line  must  then  begin on a WORD
boundary, plus that the  processor  type  must  be  taken into account
(which is given as the first word in the TIFF file header).


COMPRESSING IT

The above described formats are, in all cases, the formats which serve
as input  for  compression  and  output  for  decompression.  All  the
compression schemes (that  I  know  of)  work  in  a  straight forward
fashion line by line. (And none has in fact such a feature as even the
line repeat of IMG.)



                 LZW (Lempel-Ziv & Welch) compression
                 ------------------------------------

LZW is used in all GIF (there is no such thing as a non-compressed GIF
file) and probably most TIFF  files,  and  the implementation of it is
very similar in GIF and TIFF.  In  fact  it  should be, since the TIFF
designers acknowledge to have essentially borrowed  the LZW of GIF. In
spite of this there are a few differences.

Below is first described LZW for  all  TIFF and 8-bit GIF images. Then
will be explained the  minor  modification  for  GIF  with less than 8
bits/pixel:


THE STRING TABLE:

| At both encoding and decoding a table of byte-strings encountered in
| the image is used. This table  is  initiated  to contain, as its 256
| first entries (0-255), every possible  one-byte  string; the rest of
| its strings  are  added  during  encoding/decoding  -  got  from the
| encoded/decoded image itself. A maximum of  4096 entries are used in
| GIF and TIFF LZW (0-4095).

In practice I think the table  is  most  effectively made up just from
byte-counts plus  pointers  into  the  already  processed  image data.
Alternatively,  especially  during  encoding,  each  string  could  be
represented by a reference to a  previously used string plus one extra
byte - or, perhaps even better, the  references could go the other way
forming  a  tree  structure  to  reduce   the  time  spent  on  string
comparisons  during  encoding.   (Another   way   of  reducing  string
comparisons is a  technique  called  "hashing"  whereby  a  formula is
defined to calculate for each possible  string a short numerical value
- as a, simple though not ideal,  example  adding all the bytes of the
string and skipping  any  overflow.  String  comparisons  can  then be
limited to strings with the same numerical value.)

COMPRESSION:

| The input (non-compressed) image  is  read  byte  by  byte. Each new
| byte is  "added"  (concatenated)  to  the  previous  one  to  form a
| growing byte string, as long  as  it  can  be  found in the table of
| already encountered strings.
| When not, the  existing  string  (before  concatenation)  is used as
| output encoded as its table entry number, the would-have-been string
| is added as a new table entry  after  the  last, and a new string is
| begun with the just read byte as its first byte.
| Before compression begins, the  "current  string"  is initiated to a
| null string.

DECOMPRESSION:

| The input (compressed) image is read  code  by code, using each code
| read as an index into the string  table.  A new entry, at the end of
| the table, is formed by using as  length the length of the looked up
| string PLUS ONE, and as  pointer  the  current output pointer BEFORE
| OUTPUT. Then the looked up string is  output, after which a new code
| is read. Thus  the  same  string  table  will  be  automatically re-
| constructed as was used at the time of encoding. (I assume here that
| string copying is done first  byte  first,  since  in some cases the
| string to output will be missing its last byte until it is filled in
| as the first copied byte - i.e. from the string beginning.)


In GIF and TIFF LZW the first  two free table entries (right after the
one-byte strings) are  reserved,  since  the  corresponding codes have
special meanings:

 256 is the "Clear" code, to indicate  that the string table should be
     (re-)initialised and all  but  the  one-byte  strings cleared. It
     should be written as the first  of  all codes in any image (/TIFF
     strip) and can be used again later at any time.
 257 is the EndOfInformation (EOI) code to be written as the last code
     of the image (/TIFF strip).

So the first (2-byte) string actually encountered in the image will be
entered as 258.


The codes, corresponding to table entry  numbers, are to begin with 9-
bit numbers. Since no codes higher than 511 are possible to express in
9 bits, the code  size  is  increased  to  10  bits  when entry 512 is
created. The exact point  when  to  do  this  is  the first difference
between GIF and TIFF:

GIF  neatly does this no sooner than  exactly when needed. That is THE
     STEP AFTER WHEN ENTRY 512 is created  (the one in which entry 513
     is to be created). This is the first time that the CODE 512 could
     possibly be used/encountered.

TIFF rashly does it one step earlier.  I.e.  IN THE SAME STEP AS ENTRY
     512 is created or right  after  entry  511 has been created. This
     means that a few bits  are  unnecessarily  wasted in a TIFF file,
     but isn't perhaps much to make a fuss about.

Similarly, table entry 1024 marks  the  beginning  of 11-bit codes and
entry 2048 signals 12-bit codes.


The code numbers are  packed  in  consecutive  BYTES, not words, which
means that the compressed data don't have  to begin on a word boundary
(and that the TIFF processor  specification  can  be ignored). But how
this is done is the second difference between GIF and TIFF LZW.

TIFF Does it the logical way "left to right". I.e. codes ABCDEFGHI and
     jklmnopqr would be packed as:  ABCDEFGH Ijklmnop qr------

GIF, for pure spite against us  Motorola  programmers  and a desire to
     see us suffer, awkwardly packs the two codes above as:
     BCDEFGHI lmnopqrA ------jk.  As  can  be  seen  this forces us to
     reverse the byte order and then  read  the codes backwards to get
     it right.


Neither GIF nor TIFF allows the table  to  go beyond entry 4095 or the
code length beyond 12 bits.  But  the  method  used to enforce this is
another difference between GIF and TIFF LZW:

TIFF simply puts a requirement on  the  ENCODER  to issue a Clear code
     before the problem arises.  This  must  then  be  done as soon as
     entry 4094 has been created  (If  we  wait  until after 4095, the
     Clear code itself could be misinterpreted to mark the addition of
     an entry 4096 and thus be read as 13-bit.)

GIF  on the other hand requires both  ENCODER AND DECODER to make sure
     that entry 4095 is the last to be added, and that the code length
     remains 12-bit.  Encoding/decoding  proceeds  normally  using the
     table entries already defined. No Clear code needs to be written.
     (To be sure some old GIF decoders  may  not adhere to this, so an
     encoder that wants  to  be  very  nice  could  issue  Clear codes
     anyway, after the creation of entry 4095.)

When a Clear IS written, the last  byte read before it (which couldn't
have been represented in  the  string  encoded  just before the Clear)
must be written after the  Clear  as  a  9-bit  code.  (Or it could be
written before it as a 12-bit number or whatever. If the latter method
is used in TIFF, the Clear must come right after the creation of entry
4093 at the latest.)
The decoding process, encountering  a  Clear,  only  have to clear the
extra table entries and revert  to  9-bit  codes, then can continue as
before.


The LZW of GIF and TIFF  doesn't  bother  about whether or not encoded
strings make halt at line ends.  Only  at  the image end (strip end in
TIFF)  will  the   compression   algorithm   stop   -   and   add  the
EndOfInformation code. The decoding  process  will  not pause until it
finds this EOI.


LESS THAN 8-BIT GIF:

In GIF, unlike TIFF, all the bits of  each byte are not used in images
with less than 8 bits/pixel (see  the uncompressed format above). Thus
the number of possible one-byte strings is  less than 256, and this is
used in GIF to reduce the initial  code length and lower the number of
the first free table entry.

In the beginning of the image data in  a GIF file will always be found
a number X which is used as follows:

     Initial code length        =  X+1 bits
     Clear code                 =  1<<X
     EOI                        = (1<<X)+1
     First initially free entry = (1<<X)+2

When encoding a GIF file you simply set
     X = number of bits/pixel, BUT NEVER BELOW 2

If you try with X=8 you will arrive
at all the values given above:  9 bits, 256, 257 and 258 respectively.
For ST low rez you get (X=4):   5 bits,  16,  17 and  18.
And for ST high & medium (X=2): 3 bits,   4,   5 and   6.

The code length will change from 3  to  4  bits at table entry 8, to 5
bits at 16, 6 bits at 32, 7 bits  at  64, 8 bits at 128, 9 bits at 256
etc. Maximum code length is still 12 bits, and last entry 4095.



                TIFF compression type 32773 - PackBits
                --------------------------------------

(The impressive type number seems  due  to it being originally defined
as a "private" type. It now however is in the general TIFF standard.)

The good old PackBits  compression  can  be  encountered  only in TIFF
"bilevel" i.e. black&white images. Though in most cases less effective
than LZW, it is  simple  and  very  fast.  The  reason PackBits is not
foreseen to be used with colour  images, is probably that colour image
data are - as described  above  -  in  TIFF  files  is never stored in
bitplane separated form, and PackBits can therefore not deal with them
as effectively as in IFF ILBM /Degas files.

It is exactly the same as  in  IFF  ILBM and Degas compressed, but for
the sake of completeness:

Data are stored as a series of  commands each lead by a signed control
byte:
     x  0:        use next x+1 bytes as they are (1-128).
     x < 0         repeat the one next byte |x|+1 times (2-128).
     x = -128      not used.
No command may extend over more than one plane and line.



                       TIFF compression type 2
                       -----------------------
  CCITT Group 3 1-Dimensional Modified Huffman run length encoding.

This compression type too can  be  encountered  only in TIFF "bilevel"
i.e. black&white images.  And  even  for  these,  PackBits  or LZW are
preferred in the general  case.  Its  speciality  is images with large
areas of solid white intervened by short  runs of solid black (Such as
scanned text,  line  drawings  etc.)  These  will  be  compressed very
effectively indeed, but it's  not  fast  and  the result for patterned
images can be awful (many times bigger than the original at worst).


White and black pixel runs are  encoded as commands of variable length
(2-13 bits) made up so that  each  command is uniquely determined when
its bits have been read (see tables below). These (I assume) should be
packed into bytes similar  to  how  LZW  codes  are  treated. New rows
always begin on byte boundaries. A different  set of codes is used for
each colour (since white runs are expected to be on the average longer
than the black runs).
Whether ones or zeroes should be encoded  as white or black is decided
by the PhotometricInterpretation field (see TIFF descriptions below).

Each line always begins with a white run (of zero length if the actual
line beginning  is  black),  after  which  black  and  white  runs are
expected to alternate. The sum of  all  run lengths for each line must
equal ImageWidth.

For run lengths < 64  pixels,  a  single  code  is  used (which can as
mentioned be as long as  13  bits). The same less-than-64-pixels codes
are used for longer run lengths too,  but  are then preceded by one or
more "make-up" codes, each representing up to 2560 pixels.


 Run    White    Black
length  code     code
 ----   ----     ----

   0  00110101   0000110111
   1  000111     010
   2  0111       11
   3  1000       10
   4  1011       011
   5  1100       0011
   6  1110       0010
   7  1111       00011
   8  10011      000101
   9  10100      000100
  10  00111      0000100
  11  01000      0000101
  12  001000     0000111
  13  000011     00000100
  14  110100     00000111
  15  110101     000011000
  16  101010     0000010111
  17  101011     0000011000
  18  0100111    0000001000
  19  0001100    00001100111
  20  0001000    00001101000
  21  0010111    00001101100
  22  0000011    00000110111
  23  0000100    00000101000
  24  0101000    00000010111
  25  0101011    00000011000
  26  0010011    000011001010
  27  0100100    000011001011
  28  0011000    000011001100
  29  00000010   000011001101
  30  00000011   000001101000
  31  00011010   000001101001
  32  00011011   000001101010
  33  00010010   000001101011
  34  00010011   000011010010
  35  00010100   000011010011
  36  00010101   000011010100
  37  00010110   000011010101
  38  00010111   000011010110
  39  00101000   000011010111
  40  00101001   000001101100
  41  00101010   000001101101
  42  00101011   000011011010
  43  00101100   000011011011
  44  00101101   000001010100
  45  00000100   000001010101
  46  00000101   000001010110
  47  00001010   000001010111
  48  00001011   000001100100
  49  01010010   000001100101
  50  01010011   000001010010
  51  01010100   000001010011
  52  01010101   000000100100
  53  00100100   000000110111
  54  00100101   000000111000
  55  01011000   000000100111
  56  01011001   000000101000
  57  01011010   000001011000
  58  01011011   000001011001
  59  01001010   000000101011
  60  01001011   000000101100
  61  00110010   000001011010
  62  00110011   000001100110
  63  00110100   000001100111

Make-up codes (must be followed by one of the above codes)

  64  11011      0000001111
 128  10010      000011001000
 192  010111     000011001001
 256  0110111    000001011011
 320  00110110   000000110011
 384  00110111   000000110100
 448  01100100   000000110101
 512  01100101   0000001101100
 576  01101000   0000001101101
 640  01100111   0000001001010
 704  011001100  0000001001011
 768  011001101  0000001001100
 832  011010010  0000001001101
 896  011010011  0000001110010
 960  011010100  0000001110011
1024  011010101  0000001110100
1088  011010110  0000001110101
1152  011010111  0000001110110
1216  011011000  0000001110111
1280  011011001  0000001010010
1344  011011010  0000001010011
1408  011011011  0000001010100
1472  010011000  0000001010101
1536  010011001  0000001011010
1600  010011010  0000001011011
1664  011000     0000001100100
1728  010011011  0000001100101

          White&Black
1792      00000001000
1856      00000001100
1920      00000001101
1984      000000010010
2048      000000010011
2112      000000010100
2176      000000010101
2240      000000010110
2304      000000010111
2368      000000011100
2432      000000011101
2496      000000011110
2560      000000011111



                       The GIF 'Logical screen'
                       ------------------------

A general GIF file can contain more  than  one image, each of which to
be placed within a  'logical  screen'.  This  is  simply an area large
enough to hold any and all of  the  images. The images are supposed to
be written to the logical  screen  in  the order they appear, possibly
overwriting earlier images,  without  pausing.  (These  rules could be
altered - see the - v. 89a - Graphic Control Extension below.)

Each image could also have its  own local palette defined (though this
is probably rarely used), in which  case  the global palette should be
restored as soon as the image has been done.

The  concept  of  the  logical  screen,  with  multiple  'sub-images',
certainly makes life a bit  more  difficult  for  programs that are to
properly handle any GIF file. However,  I think most GIF files contain
only single images and that the logical screen can be ignored.



                  GIF (Graphics Interchange Format)
                  ---------------------------------

In the GIF description I have used the  term 'Iword' for 2 bytes to be
read as a word, Intel style,  Least  Significant Byte first. (An Iword
in a GIF file doesn't have to begin on a word boundary.)

3 bytes   'GIF' - signature
3 bytes   '87a'  (or  '89a'  or  later  if,  more  or  less  peculiar,
          extension blocks are used, see below) - version

------- 'LOGICAL SCREEN' (or global) DESCRIPTOR (7 bytes):
1 Iword   'Screen' width     These two values define an area large
1 Iword   'Screen' height    enough to hold all images of the file.
1 byte    Bit  7:   Flag for a global colour map following descriptor
          Bits 6-4: 3-bit number to be increased by 1 to get number of
                    bits/PrimaryColour (E.g. ST:2, STE:3, Falcon:5)
          Bit  3:   Flag for sorted palette. If set, the most
                    important (frequent?) colours come first.
          Bits 2-0: 3-bit number to be increased by 1 to get number of
                    bits/pixel (E.g. mono:0, ST low:3, 256colour:7)
1 byte    Colour index of screen background.  (To  be used with Global
          colour map if one is specified.)
1 byte    Pixel aspect ratio: 0=No info given.  1-255 = value X to put
          into the equation: Aspect  ratio  (Width:Height) = (X+15)/64
          (E.g. 49 for perfect square pixels).

~~~~~~ If flagged for: GLOBAL COLOUR MAP (#entries = 1 << #bits/pixel)
3n bytes  3  bytes  per  colour:  red,   green,  blue   8  bit  colour
          components (0-255). Exactly like the palette in for instance
          IFF ILBM images.

------- Any number of Images (possibly no-one) according to:
          ~~~~~~~ Possible Extension block(s), each according to:
          1 byte   '!' (ascii exclamation mark)
          1 byte   function code
          1 byte   R (1-255): # of bytes in data sub-block
          R bytes
          1 byte   S (1-255): # of bytes in data sub-block
          S bytes
          etc.
          1 byte   0 marks the end of extension block

          ------- IMAGE DESCRIPTOR (10 bytes):
          1 byte   ',' (ascii comma)
          1 Iword  X-offset for image on 'logical screen'
          1 Iword  Y-offset for image on 'logical screen'
          1 Iword  Width of the image in pixels
          1 Iword  Height of the image in pixels
          1 byte   Bit  7:   Flag for a local colour map following
                             descriptor (otherwise use global map)
                   Bit  6:   If set: Image rows in interlaced order
                             Otherwise in sequential order.
                   Bit  5:   Flag for sorted palette. If set, the most
                             important (frequent?) colours come first.
                   Bits 4-3: 0
                   Bits 2-0: If bit 7 is set, this is a 3-bit number
                             to be increased by 1 to get number of
                             bits/pixel (determining palette length)
          ~~~~~~ If flagged for: LOCAL COLOUR MAP
          3n bytes
          ------- RASTER DATA
          1 byte   minimum LZW code size minus 1 (=#bits/pixels except
                   for black&white images when =2) - see LZW above.
          1 byte   R (1-255): # of bytes of LZW packed data
          R bytes
          1 byte   S (1-255): further # of bytes of LZW packed data
          S bytes
          1 byte   T (1-255): further # of bytes of LZW packed data
          T bytes
          etc. until all data done.
          1 byte   0 = End of image data

                   The division of image  data  into sub-blocks has no
                   bearing on how  to  perform  the decompression, and
                   all sub-blocks  could  well  be  combined  into one
                   continuous data string before decompressing it.

                   The pixel rows of the decompressed image are stored
                   sequentially top to  bottom  if  the interlace flag
                   (bit 6 of byte 10  in image descriptor) is cleared.
                   If not they are stored interlaced in four passes:
                   1. Every 8th row starting with row 0
                   2. Every 8th row starting with row 4
                   3. Every 4th row starting with row 2
                   4. Every 2nd row starting with row 1
                   See above for more about uncompressed GIF images.
          -------
~~~~~~~ Possible Extension block(s), each according to:
1 byte    '!'
1 byte    function code
1 byte    R (1-255): # of bytes in data sub-block
S bytes
1 byte    T (1-255): # of bytes in data sub-block
T bytes
etc.
1 byte    0 marks the end of extension block
------
1 byte    ';' (ascii semicolon) GIF Terminator


When reading a  GIF  file  a  program  should  always  be  prepared to
encounter extension blocks it doesn't  understand. These can always be
skipped and are no reason for the program to give up. Neither should a
program give up if the GIF  version  is  unknown,  but try to make its
best with it.

When writing a GIF file, if  no  extension blocks are used the version
should be set to  "87a".  If  any  of  the  blocks  below are used the
version should be set to  "89a".  (The  Sorted  palette flag and Pixel
aspect ratio were also added with 89a  -  zeroed  in 87a -, but are in
themselves, as I understand it, no  reason  to bump the version number
up above 87a.)



                 Extension blocks of GIF version 89a
                 -----------------------------------

     COMMENT
     Recommended to be used only  in  the  beginning (after the global
     colour map) or end (before the GIF Terminator) of file.
1 byte    '!'
1 byte    254 = Comment Extension
1 byte    S (1-255): # of bytes in data sub-block
R bytes   ASCII text (preferably limited to characters 32-127)
1 byte    R (1-255): # of bytes in data sub-block
S bytes   ASCII text (preferably limited to characters 32-127)
etc.
1 byte    0 marks the end of extension block

     PLAIN TEXT
     This  block  can  effectively  replace   a  normal  image  (image
     descriptor + local colour  map  +  raster  data). It represents a
     simple image composed of  ascii  characters,  NOT compressed, and
     always use global colours.
1 byte    '!'
1 byte    1 = Plain Text Extension
1 byte    12: # of bytes in data sub-block
1 Iword   X-offset in pixels for image on 'screen'
1 Iword   Y-offset in pixels for image on 'screen'
1 Iword   Image width in pixels   (Should be multiple of char. width)
1 Iword   Image height in pixels  (Should be multiple of char. height)
1 byte    Character cell width in pixels  (8 recommended)
1 byte    Character cell height in pixels (8 or 16 recommended)
1 byte    Foreground colour index (into global colour map)
1 byte    Background colour index (into global colour map)
1 byte    R (1-255): # of bytes in data sub-block
R bytes   Ascii character data
1 byte    S (1-255): # of bytes in data sub-block
S bytes   Ascii character data
etc.
1 byte    0 marks the end of extension block

     GRAPHIC CONTROL
     This block affects next graphic  block (Image Descriptor or Plain
     Text Extension):
1 byte    '!'
1 byte    249 = Graphic Control Extension
1 byte    4 (length of data sub-block)
1 byte    Bits 7-5: 0
          Bits 4-2: Code (0-7) for what to do after display
                             0: Not defined, 1: Leave image
                             2: Replace with background colour
                             3: Restore to previous, 4-7: reserved.
          Bit  1:   If set, wait for user input after display
          Bit  0:   Flag for Transparency index given
1 Iword   Delay time, if not 0, in hundredth of a second to wait after
          display. (If waiting for user input as well, the wait is for
          what happens first.)
1 byte    If flagged: Transparent colour index.
1 byte    0 marks the end of extension block

     APPLICATION
     The meaning of this block is for the application to define.
1 byte    '!'
1 byte    255 = Application Extension
1 byte    11: # of bytes in data sub-block
8 bytes   Application Identifier (8 Ascii characters)
3 bytes   Application Authentication Code (binary code)
1 byte    R (1-255): # of bytes in data sub-block
R bytes
1 byte    S (1-255): # of bytes in data sub-block
S bytes
etc.
1 byte    0 marks the end of extension block
--------------------------------------------



                     TIFF (Tag Image File Format)
                     ----------------------------

1 word    'MM' for Motorola format or 'II' for Intel format.
          In a file in  Intel  format  the  order  of  bytes has to be
          reversed in every word and longword value.
1 word    =42. ID for TIFF  file.  (Actually  a  'version' number that
          will however probably never change.)
1 long    Offset (relative to file  start)  to  the (first) Image File
          Directory. (Always on a word boundary.)

Each IFD has the following format:
1 word    Number of entries
~~~~~~~~~~~~~~~~~~~~~ For each entry:
          1 word   Tag for  the  field  (unsigned).  Entries  must  be
                   sorted - in ascending order - by this.
          1 word   Field type (information units):
                   1=BYTES (unsigned byte integers)
                   2=ASCII (null terminated ascii byte string)
                   3=WORDS (16-bit unsigned integers - called SHORT)
                   4=LONGS (32-bit unsigned integers)
                   5=RATIONALS (double longs: numerator, denominator)
          1 long   Number of - according to type - bytes, words, longs
                   or double longs.  Type  2  (ASCII)  is  measured in
                   bytes including the ending null.
          1 long   Value or - if this not  fits  into 4 bytes - offset
                   (rel. to file start) to  Value.  Use field type and
                   length to determine if a  value  fits into 4 bytes.
                   Note that byte, word  and  string values are stored
                   in FIRST byte(s)/word of  this  longword, when they
                   fit into it.
~~~~~~~~~~~~~~~~~~~~~
1 long    Offset (rel. to file start) to next IFD or zero.


That's it! So where are the  image  data?  Well, the pointers to these
are held as the contents of a  StripOffsets field (with tag 273). This
and other TIFF fields are described below.

IFDs,  field  values  and   image   strip   data,   can  be  scattered
indiscriminately all over  the  file  in  any  order,  as  long as all
pointers are right. The  only  restriction  is  that  they all must be
preceded by the 8-byte file header.

NOTE: The official TIFF  documentation  explicitly  requires all IFD:s
and field contents (but normally  not  image  strips) to start on word
boundaries (even addresses), and  I  don't  think  a program should be
required to deal with anything else. However,  I have seen a TIFF file
(in 'II' format) violating this rule, so it may be a good idea to make
the program at least  prepared  to  exit  gracefully  if it encounters
words/longs on odd addresses.

If there are more than one IFD,  each IFD defines a "subfile" or "sub-
image", somehow related to the main  image (e.g. transparency masks or
versions in lower resolutions). The  main  image is always represented
by the first IFD.

A TIFF file reader must be prepared  to encounter, but doesn't have to
be able to use, subfiles and fields other than the required ones.

A TIFF file editor  must  omit,  from  the  re-saved  file, any fields
and/or  subfiles  that  it  doesn't   understand  and  can  modify  to
correspond to the changes made to the (main) image.

Tag numbers and enumerated  values  (e.g.  compression  types)  32768
($8000 in hex) are 'private', i.e. defined by organizations for their
own, possibly secret, purposes, and registered with the administrators
of TIFF (Microsoft and Aldus).  PackBits compression with number 32773
seems to have been introduced this  way  into the TIFF system, but now
is general standard.



                             TIFF classes
                             ------------

TIFF classes are subsets  of  the  full  system,  each  dealing with a
particular kind of image, and each  therefore requiring only a smaller
part of the wide variety of existing fields.

The classes defined are
  B (Bilevel i.e. black & white images)
  G (Greyscale images)
  P (Palette colour images)
  R (RGB direct colour images)
  F (Fax documents)  Subclass of B (All its requirements + some more)

The idea is to make it possible  to write a program capable of dealing
with only one or two  TIFF  classes.  For  instance a program that can
cope  only  with  black&white  and  palette  colour  images  could  be
described as coping with TIFF B,P pictures.

I don't list the fields needed with each TIFF class separately though,
since I think this should be clear from the field descriptions below.

The exception is TIFF F, for  which  I haven't got enough information.
(In particular I don't know how  TIFF  compression types 3 and 4, used
with fax documents, work.)



                          Fields of TIFF 5.0
                          ------------------

Below are listed (some of)  the  fields  that  can  be found in a TIFF
file. For each field is on the first line given the tag number and the
definer's name for the  field,  then  the  information  unit type, the
number of units and finally a  description.  The fields are not listed
according to tag number order (as they should be in a file).

Many fields have DEFAULT VALUES. When  writing a TIFF file such fields
can optionally be left out if the default is the desired value.



                             Core fields
                             -----------

These fields should,  when  writing  TIFF  files,  probably  always be
included where they are relevant  for  the  type  of image (but fields
containing default values  need  never  be  explicitly  written). When
reading, all the important data should be found within them.


254 NewSubfileType   (In older TIFF files there was a 255 SubfileType)
     Type = 4 (long)
        # = 1
     Value is a longword to be read as 32 flag bits. Unused bits = 0.
     Bit  If set:
      0   Reduced resolution version of another image in TIFF file.
      1   Single page of a multi-page image (e.g. fax document).
      2   Image is a mask  for  another  image  in  this TIFF file. In
          which case PhotometricInterpretation should be 4.
     Default is 0 (No bits set).


262 PhotometricInterpretation
     Type = 3 (word)
        # = 1
     0 =  Bilevel (mono) or greyscale image: 0=white (and max.=black).
     1 =  Bilevel or greyscale image: 0=black (and maximum=white).
          (Whether an image is  greyscale  or  bilevel  can be deduced
          from BitsPerSample.  Note  that  -  at  least  for greyscale
          images -  any  GrayResponseCurve  field  will  override this
          field - see below.)

          0 is normal for Atari mono  resolutions (and default in mono
          IMG files), but 1 is recommended by the definers of TIFF.

     2 =  RGB. 0 is minimum intensity, max. of each sample is white.

     3 =  Palette colour image. Each sample is an index into ColorMap.
          (But see below about palette colour images in RGB format.)

     4 =  Image is a mask (Bilevel  only).  1-bits represent pixels to
          use, zeroes should be  skipped.  The  mask  should of course
          have the same ImageWidth and ImageLength as the main image.

     No default.


277 SamplesPerPixel
     Type = 3 (word)
        # = 1
     Number of 'samples' per pixel.  The  sample(s) is/are the numeric
     value(s) for each  pixel.  So  SamplesPerPixel  should  be  1 for
     bilevel (black&white), greyscale, and  palette colour images, and
     3 for RGB images. Default = 1.


258 BitsPerSample
     Type = 3 (words)
        # = SamplesPerPixel
     Number of bits per sample.  (E.g.  for  black&white 1, for ST low
     rez 4 and for Falcon High Colour 5,6,5.)
     It is normally recommended to use  values divisible by 2, to make
     pixel boundaries conform with byte  boundaries as far as possible
     - thus extending 3 to 4  and  5-7  to  8,  using fill bits in the
     image data. In the case  of  the  16-bit  High Colour, though, if
     PlanarConfiguration=1 (see below) each pixel  will fit in exactly
     2 bytes without any extension with fill bits.
     Default = 1 (black & white).


320 ColorMap
     Type = 3 (words)
        # = 3*(1<<BitsPerSample)
     Palette (for palette colour images) in the form of THREE SEPARATE
     TABLES. First come the  red  components  for  all  colours of the
     palette, then the greens followed by the blues.

     Each component is a number  0-65535  (0-$FFFF).  So that black is
     represented by 0,0,0 and white by 65535,65535,65535.

     No default. Must be included in all palette colour images.


284 PlanarConfiguration
     Type = 3 (word)
        # = 1
     This field is relevant only for RGB direct colour images.
     1 =  The sample values for  each  pixel  are stored contiguously:
          RGBRGBRGB...
     2 =  The samples are  stored  in  separate  "sample planes", each
          with its separate set  of  StripOffsets and StripByteCounts.
          The red plane StripOffsets  (and StripByteCounts) are stored
          first, followed by all the green and, last, the blue ones.
     Default is 1.


256 ImageWidth
     Type = 4 or 3 (long or word).  4 (long) recommended.
        # = 1
     Image width in pixels. No default.


257 ImageLength
     Type = 4 or 3 (long or word).  4 (long) recommended.
        # = 1
     Image height ('length') in pixels (number of rows). No default.


278 RowsPerStrip
     Type = 4 or 3 (long or word).  4 (long) recommended.
        # = 1
     Number of pixel rows per  "strip".  The organization of the image
     data into  strips  is  primarily  intended  for  fast  access  to
     individual rows when the  data  is  compressed, but can sometimes
     simplify reading of uncompressed images as well.

     RowsPerStrip is recommended to be  set  so  as to make each strip
     about 8 K in size (if  possible).  E.g.  for an ST low/medium rez
     screen 50 would be a suitable RowsPerStrip value, and for ST high
     rez 100, to split each ST  screen  into four strips of 8000 bytes
     each. VGA 8-colour 640x480 pictures  are  typically split into 40
     12-row strips of 7680 bytes each.

     Total number of strips in image can be calculated as:
     StripsPerImage = (ImageLength-1)/RowsPerStrip + 1 truncated

     Default RowsPerStrip is $FFFFFFFF (effectively infinity).


273 StripOffsets
     Type = 4 or 3 (longs or words).  4 (longs) recommended.
        # = StripsPerImage   except if PlanarConfiguration=2 (RGB)
                             when # = StripsPerImage*SamplesPerPixel
     Offset (relative to TIFF file start) to each strip. (The data for
     different strips can be scattered arbitrarily over the file.)
     BY THIS FIELD ONLY, THE IMAGE DATA CAN BE FOUND. No default.

     For the organization of the image  data see the beginning of this
     file.


279 StripByteCounts
     Type = 3 or 4 (words or longs).  3 (WORDS) is recommended.
        # = StripsPerImage   except if PlanarConfiguration=2 (RGB)
                             when # = StripsPerImage*SamplesPerPixel
     Number of bytes in each strip (as  it  is stored in the file, NOT
     as it would be if  uncompressed).  This  field exists to simplify
     buffering of compressed data and  should  be included in new TIFF
     files (although it may not be present in older ones). No default.


259 Compression
     Type = 3 (word)
        # = 1
     1 =  No compression.

  32773 = PackBits compression.  ONLY  FOR  BILEVEL  (i.e. black&white
          "mono") images including masks, where  it is the recommended
          compression type (and by far the fastest).

     2 =  CCITT Group  3  1-Dimensional  Modified  Huffman  run length
          encoding. ONLY FOR BILEVEL IMAGES.

          Optimized  for  images  with  large  areas  of  solid  white
          intervened by short runs of  solid black (e.g. scanned text)
          but slow and very bad with patterned images.

          If you want  to  officially  claim,  for  your  program, the
          capability to handle TIFF mono  images,  it  must be able to
          deal with this compression  type.  The generally recommended
          TIFF B compression scheme is PackBits though, or LZW.

     5 =  LZW Compression. The only TIFF compression scheme (except 1=
          uncompressed) to be used with colour or greyscale images. It
          could probably be used with bilevel images as well.

          Very adaptive (achieving very  good  results with many types
          of images) since it builds  up  a  table of byte strings got
          from the image itself to  be  replaced with short (9-12 bit)
          codes.

     For further descriptions see the beginning of this file.

     There are also compression types 3  and  4 defined, but these are
     for use with fax documents only and  need not be known even by an
     otherwise fully TIFF capable program. (And  I don't know how they
     work anyway.)

     Default = 1 (no compression).


317 Predictor
     Type = 3 (word)
        # = 1
     When Compression=5 (LZW) a reader  should  check for the presence
     of  this  field,   telling   whether   or   not   a  "prediction"
     (preprocessing) scheme has been used before coding.

     1 =  No prediction. Coding was done directly on unmodified image.

     2 =  Horizontal differencing. Sample(s) for  first  pixel of each
          line is unchanged, but all  other sample values are replaced
          with the difference  between  this  pixel's  sample  and the
          sample for the previous pixel  (this  one minus the previous
          one). BitsPerSample  is  thereby  retained  (negative values
          stored as two's complement), and in  RGB images Red is to be
          subtracted from Red, Green from Green and Blue from Blue.

          Doesn't do much good to  a  palette colour or bilevel image,
          but can be useful with True  Colour and some 8-bit greyscale
          images.

     Default is 1 (no prediction)


282 XResolution
     Type = 5 (rational, ie 2 longs: first numerator then denominator)
        # = 1
     Number of pixels per ResolutionUnit  (see  below) in X direction.
     No default.

283 YResolution
     Type = 5 (rational)
        # = 1
     Number of pixels per ResolutionUnit in the Y direction.
     No default.

296 ResolutionUnit
     Type = 3 (word)
        # = 1
     To be used with XResolution and YResolution.
     1 = No absolute unit.
     2 = Inch.
     3 = Centimetre.
     Although 1 can always be used  when  creating an image file where
     only the 'shape' (aspect) but not size (density) of the pixels is
     determined/important, the recommended practice  in  such cases is
     to use  inches  or  centimetres  and  then  pick  XResolution and
     YResolution so as  to  make  the  image  about  10 centimetres (4
     inches) wide and/or high. Default is 2 (Inch).



                 Palette colour images in RGB format
                 -----------------------------------

318 ColorImageType
     Type = 3 (word)
        # = 1
     This can be used in  RGB  images  which  were actually created as
     palette images, or  that  for  some  other  reason  use a greatly
     restricted range of colours, usually  256 colours. (Though there
     may be borderline cases.)
     1 =  Continuous tone, natural image.
     2 =  Synthetic image, using a very restricted range of colours.
     Default is 1.

319 ColorList
     Type = 1 or 3 (bytes or words)
        # = palette length * SamplesPerPixel (where
     (Only in RGB images with a ColorImageType field =2.)
     A palette of the colours actually  used  in an RGB image. The RGB
     values for each  colour  stored  in  consecutive  bytes (like the
     palette used in GIF or IFF ILBM images) or words. The colours can
     be stored in any order. No default.

The rationale behind the fields  ColorImageType  and ColorList is that
any and all palette colour images can be stored as RGB images instead,
to save image readers the  need  to  be  able  to  deal with them as a
special case. LZW compression, it is argued, will make the extra space
requirements insignificant. And although this  will  put extra work on
writers converting  palette  colour  to  RGB,  the  readers  are  in a
majority.



            Exact colorimetric and photometric information
            ----------------------------------------------

301 ColorResponseCurves
     Type = 3 (words)
        # = 3*(1<<BitsPerSample)  i.e. 3 * (# possible sample values)

     Lookup tables of more  exact  intensity  values for each possible
     sample value in RGB images.

     This field is never  required  and  probably  of interest only to
     very professional users, who are not  content with the basic one-
     for-each-step intensity scale of the unmodified sample values.

     Each entry is a, fully used,  word. Corresponding to an intensity
     value 0-65535 (0-$FFFF).  First  come  the  red  values  for each
     possible sample value (i.e. 256 of them  for a true colour, 8 bit
     per sample, image). Then  come  all  the  greens  followed by the
     blues.

     Default:  Tables  calculated   according   to,   for  each  entry
     (corresponding to a possible sample value):

                      Sample    2.2
     Intensity  =  ( --------- )     *  65535   rounded
                     MaxSample


290 GrayResponseUnit
     Type = 3 (word)
        # = 1
     Tells how to interpret numbers in GrayResponseCurve (see below).
     1 = tenths
     2 = hundredths
     3 = thousandths
     4 = ten-thousandths
     5 = hundred-thousandths of a unit.
     Default is 2, but 3 is now recommended.


291 GrayResponseCurve
     Type = 3 (words)
        # = 1<<BitsPerSample  (i.e. # of possible sample values)

     A table of  'optical  density'  values  for  each possible sample
     value. This field simply is  to  a  greyscale image what a Color-
     ResponseCurves field is to an RGB image; a more exact photometric
     scale than the basic one-for-each-step scale of the samples.

     A GrayResponseCurve field is never required,  but - at least when
     BitsPerSample > 1 - an image  reader  should always look for one.
     If encountered it  overrides  any PhotometricInterpretation field
     (which writers still are advised to include as well though.)

     Note that  0  (no  'greyness'  =  white)  corresponds  to maximum
     intensity  and   vice   versa,   so   the   normal   Photometric-
     Interpretation=1  (i.e.   sample   0=black)   corresponds   to  a
     decreasing GrayResponseCurve.

     The grey density values are  read  as decimal fractions according
     to above GrayResponseUnit field,  typically  on  a scale from 2.0
     (maximal 'greyness' = black) to 0.0 (white). If an exact physical
     scale is unknown, the following formula  is suggested by the TIFF
     definers   to   calculate    reasonable    values   in   between:
     Constant*10log(MaxIntensity/Intensity), where the Constant should
     be selected to make the density decrease from intensity 0 (set to
     density 2.0)  to  intensity  1  significantly  steeper  than  the
     following decreases. E.g. for  a  16-step scale (MaxIntensity=15)
     the Constant can  be  set  =1  which  would if GrayResponseUnit=3
     give: (2000,) 1176, 875, 699, 574,  477, 398, 331, 273, 222, 176,
     135, 97, 62, 30, 0.
     (For more advanced  curves  the  TIFF  definers  refers  to Kodak
     Reflection Density Guide, catalogue number 146 5947.)

     No default mentioned.


318 WhitePoint
     Type = 5 (rationals, i.e. pairs of longs: numerator, denominator)
        # = 2
     This and the following are for the real pros. In RGB images.
     "White point" of the  image,  in  the  1931  CIE xyY chromaticity
     diagram, omitting the luminance (last coordinate).
     Default is the SMPTE white point, D65: x=0.313, y=0.329.

319 PrimaryChromaticities
     Type = 5 (rationals)
        # = 6
     Primary colour chromaticities:  red  x,y,  green  x,y,  blue x,y.
     Default is the SMPTE primary  colour chromaticities: red x=0.635,
     y=0.340, green x=0.305, y=0.595, blue x=0.155, y=0.070.



                         Informational Fields
                         --------------------

315 Artist
     Type = 2 (ascii)
     Person who created the image, plus any copyright message.
     You may want to put the actual string in the TIFF file beginning,
     right after the 8 byte header;  the  TIFF system with pointers to
     IFD as well as to field contents, allows this to be easily done.

306 DateTime
     Type = 2 (ascii)
        # = 20 (i.e. 19 characters + a null)
     Date & time of image creation on the format:
     "YYYY:MM:DD HH:MM:SS", (HH = 00-23; space between date and time.)

270 ImageDescription
     Type = 2 (ascii)
     General short one-line comment e.g. "1988 company picnic"

316 HostComputer
     Type = 2 (ascii)
     E.g. "Atari ST"

271 Make
     Type = 2 (ascii)
     Manufacturer of scanner, video digitizer, or whatever.

272 Model
     Type = 2 (ascii)
     Model name/number of scanner, video digitizer, or whatever.

305 Software
     Type = 2 (ascii)
     Name and release number of software package that created image.



                           Document fields
                           ---------------

Fields probably intended primarily  for  use  with  fax documents, but
might be found  useful  for  other  things.  Not  required  in classes
B,G,P,R.

297 PageNumber
     Type = 3 (words)
        # = 2
     For a multiple page (e.g. fax) document:
     First word : Page number (beginning with 0)
     Second word: Total number of pages in document.
     Pages need not appear in numerical order. No default.

269 DocumentName
     Type = 2 (ascii)
     Name of the document from which this image was scanned.

285 PageName
     Type = 2 (ascii)
     Name of the page from which this image was scanned. No default.

286 XPosition
     Type = 5 (rational, ie 2 longs: first numerator then denominator)
        # = 1
     X offset of image in page, in ResolutionUnits. No default.

287 YPosition
     Type = 5 (rational, ie 2 longs: first numerator then denominator)
        # = 1
     Y offset of image in page, in ResolutionUnits. No default.


266 FillOrder
     Type = 3 (word)
        # = 1
     This is an old field, no longer recommended in normal TIFF files,
     but which IS required in TIFF F (fax documents).
     It defines how the bits in image data bytes are to be read:
     1 =  'Normal left to right' i.e. most significant bit first.
     2 =  'Backwards' least significant bit first.
     Default is 1.

