
    ICTARI USER GROUP              ISSUE 17                  December 1994

         ___   ______     ___       _________   _________   ___
         \__\  \   __\    \  \__    \______  \  \   _____\  \__\
           ___  \  \       \  __\     _____\  \  \  \         ___
           \  \  \  \       \  \      \  ____  \  \  \        \  \
            \  \  \  \_____  \  \____  \  \__\  \  \  \        \  \
             \  \  \       \  \      \  \        \  \  \        \  \
              \__\  \_______\  \______\  \________\  \__\        \__\

                     *   m   a   g   a   z   i   n   e   *

     =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
                       I C T A R I   U S E R   G R O U P
       63 Woolsbridge Road, Ringwood, Hants, BH24 2LX   Tel. 0425-474415
     =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=

                              INDEX FOR ISSUE 17
                              ==================

    ASSEMBLY        GEM Window redraw routine.
                    Sprite packer/unpacker routines.
                    Rectangle intersection detection code.
                    Shareware program encryption system.

    C               Useful MagiC C functions.
                    GEM Tutorial by J White. Part 7.
                    Prime number generator.

    GFA             Mandelbrot generator.
                    ShadeBob, 4 bit real time shading.
                    GFA menu locating techniques.

    STOS            Factorial calculator program.
                    Wordsearch puzzle solving program.
                    Using STOS on the Falcon.

    PASCAL          Program to calculate shift bonuses, etc.

    MISC            Algorithm Corner. Roman/Arabic number conversion.
                    Barrel program. Printer output redirection.
                    Current membership list.
                    Index to Issues 1-15.

    In next months issue of ICTARI (this may change) :-

    ASSEMBLY        Roman to Arabic number conversion routine.
                    Arabic to Roman number conversion routine.
                    AES/VDI function reference sheets.

    C               GEM Tutorial by J White. Part 8.

    GFA

    BASIC           CAD 3D to DXF converter program.

    STOS            Screen demo code routines.

    PASCAL

    MISC

    For future issues :-

    Polyline drawing routines in machine code.
    Bezier curve drawing routines.
    Picture compression routine for IMG pictures.
    HP DeskJet/LaserJet compression routines (Mode 2 TIFF).
    Picture switching techniques.
    Printer driver code for printing mono/colour images.
    Using Calamus fonts.
    Sorting algorithms.
    Using the BitBlit routines.
    Code for using the File Selector.
    Using multiple rasters in machine code.
    Overscan techniques.
    STOS sprites.
    ----------------------------------------------------------------------
                                  EDITORIAL
                                  =========
    MEMBERSHIP
    ----------
    Another three members for this month, welcome to them.

    FOREIGN LANGUAGES
    -----------------
    I am writing a  GEM  program  (a  Calamus  font display/print program)
    which could be used in other European countries. As we have members in
    Sweden, Denmark, Finland, Belgium and  Greece,  perhaps they could let
    us know what is the  best  way  to  convert  a program to their native
    language. If all the text displays could be confined to the associated
    resource file, is it possible to  just  rewrite the resource file with
    the equivalent foreign words  or  are  there other complications. Does
    the ATARI operating system  contain  all  the different characters for
    European languages (including Greek)  or  is  the ROM itself different
    for foreign countries?  Are  the  keyboards  different  to the English
    versions and if not, how do you enter characters with accents on them?
    If anyone  has  information  on  writing  programs  for  use  in other
    languages perhaps they could send it in to us for publication.

    NEXT MONTH
    ----------
    If next months issue looks a little sparse it's just because I haven't
    had time to sort it out yet, spending too much time Christmas shopping
    I think. Hopefully, most  of  you  will  spend  some  of the Christmas
    holiday preparing some more articles for ICTARI.
    ----------------------------------------------------------------------
                                CORRESPONDENCE
                                ==============
    To: *.*
    From: Martin Milner

    Problem 1
    ---------
    I couldn't get the fire button on  my joystick to register anything in
    Nick Bates joystick routine (Ictari  15).  My  machine  is a 2.5MB STE
    with TOS 1.62. (It doesn't work with MagiC, either). It worked fine in
    STOS.

    I was going to write  about  it,  but after much experimentation, I've
    found it works if you disable the mouse first, so now I won't bother!

    Problem 2
    ---------
    I am writing a game in STOS making  heavy use of the zoom command, but
    zoom is so slow. Has anyone  got  an  faster assembler routine which I
    could use to replace it?

    It also uses the reduce command,  but  I have got an assembly program,
    (kindly written by Lee  Upcraft),  which  hopefully,  Martin Cubitt is
    going to put into his EXTRA extension when he gets time.

    If anyone could produce a zoom routine, then perhaps he could add that
    too.

    A quick comment on the subject of disk format, etc.

    I  prefer  the  disks  to  be  standard  9  sector  format  and  MSDOS
    compatible. This is so that I can  read  them  on my PC at work! I can
    also print out any documentation  there  as  well.  The magazine is so
    cheap, I can't see why anyone  should  think it's not value for money.
    Also, the extra files that would have  gone on an extended format disk
    can go in  the  next  issue,  so  making  your  available  articles go
    further!

    */ This is basically what we try and do, the problem is that sometimes
    we get articles  or  programs  which  are  very  large.  If  we didn't
    compress them or use a 10 sector  format we would probably end up with
    only one or two items on  the  disk.  Generally though we will try and
    keep to the standard format as much as possible.  ICTARI /*
    ----------------------------------------------------------------------
    To: Peter Hibbs
    From: PoshPaws

    As far as I can  tell,  most  dictionaries  use the same simple coding
    technique:

    All words are stored  in  alphabetical  order  and  are subjected to a
    peculiar form of delta coding  (see  last  issue),  such that only the
    changes to the last word (and where the changes start) are stored.

    e.g.    ACTUAL,ACTUALISE,ACTUALISED,ACTUALISES,ACTUALISING,ACTUALIST
    becomes ACTUAL   (6) ISE      (9) D      (9) S     (8) ING     (8) T

    where the numbers in brackets are  ASCII numbers representing how many
    letters to keep and are immediately followed  by the letters to add to
    make the new word.

    I would never recommend checking every entry in a dictionary and would
    put 27 longwords pointing to the  start of each letter. Deleting words
    would just mean putting null bytes (0)  where the word was and getting
    the checking routine to ignore null words (of length zero).

    Adding words would normally mean  a  re-sort  of  the file so the 27th
    longword pointer would point to  a  miscellaneous area where words are
    added. This would be  searched  if  the  first  letter search was non-
    successful. If the number of words  searched in the letter section was
    compared to the number of words searched in the miscellaneous section,
    this could be used as a measure of when a re-sort is needed.

    The idea of using extra bytes  for  letter combinations is a good one,
    although  smaller  combinations  would  be   more  effective  I  think
    (qu,th,ph,ch,sh,st,etc). This would  means  extra  pointers  for words
    starting with these combinations and would require that the word to be
    searched for is converted (tokenised) into the same format.

    Bits for common endings is great  for  things like scrabble, where you
    don't know the word you're looking for, but you would have to consider
    the overheads involved in bit testing the ending byte for the bits you
    need. You could, with a bit of work, use the same technique for common
    starts (pre,re,anti,etc).

    On the other hand you could Huffcode  the words as they stand and then
    Huffcode the word you are looking  for  with the same tree and compare
    bit patterns. You would find the  space saving would be excellent, but
    the search overheads would be quite large in terms of speed.

    P.S. Quicksort is only fastest  if  the  file  to  be sorted is not in
    order. It becomes very  slow  when  a  file  is  already sorted. Use a
    heapsort for large sorts because it  is only 20% slower than quicksort
    at worst and very much  faster  on  a  file  that is already nearly in
    order.

    To: Dick Teuber

    There are a few programs that do what you require in rerouting printer
    output;

    I have included the program BARREL on disk, which should be able to do
    what you require for smaller printouts.  If your printing programs use
    GEMDOS programs, you can write  a  small  program to reroute output by
    opening a file and 'Fforce'ing the  printer handle to that file handle
    before your program is called.

    There is a  program  floating  about  called  re-route  (which  I have
    somewhere) that re-routes to the serial  port,  but I don't know if it
    will route to a file.

    If you're still  stuck,  let  us  know  exactly  why  you're stuck and
    something can easily be written to do what you need.

    To: Marten Lindstrom

    ALL Atari Screens are WORD INTERLEAVED  (including all TT modes that I
    am aware of). Mono can be thought  of  as 1 plane of word interleaved.
    High Colour (previously called True  Colour)  is  a special case where
    each word is a pixel colour value  and  can be thought of as one plane
    of word sized pixels.

    As far as I can  tell,  VsetRGB  waits  for  the  VBL to do the actual
    update.
    ----------------------------------------------------------------------
    To: Peter Hibbs
    From: Keith Baines

    Text compression in dictionary files.

    You asked about this in  the  November  1994  Ictari  disk. I did some
    assembler programming using dictionary files  a  few  years ago when I
    developed a spelling checker  desk  accessory  to  use with the Wordup
    word processor. This used a mixture of RAM and disk based dictionaries
    and coded words to keep file sizes and memory usage down. The whole DA
    would be too long to include  in  the  disk magazine (and I guess that
    spelling checkers are a minority  interest  among  members) and in any
    case is no  use  to  anyone  without  the  commercial  program  it was
    designed to complement. So I've  extracted  the  main routines and put
    them in a TOS program to demonstrate  how  they work, and written up a
    commentary of why I chose those particular dictionary formats.

    */ Thanks for the code and  programs,  we will publish them early next
    year.  Keith's  code  also  uses  the  techniques  described  by  Posh
    Paws  above  and  should   be  of   interest   to  anyone  using  text
    compression. See also next letter.  ICTARI /*
    ----------------------------------------------------------------------
    To: Peter Hibbs (and others interested)
    From: Mrten Lindstrm

    DICTIONARY FILES

    It  so  happens  that  I  made  some  investigations  of  the  Protext
    dictionary files a while ago. Unfortunately  the personal notes I made
    are a bit  messy,  and  when  reading  them  now  I  discovered  a few
    suspicious things that make me  not  trust  them  100%. (But if you or
    anyone else are interested I  could  tidy  them  up.) I think you only
    wanted the general ideas though, rather than the exact file format, so
    here goes:

    The words in all  Protext  dictionary  files  are  stored in (kind of)
    alphabetical order. And for each  word  is  stored a number that tells
    how many letters to remove  from  the  end  of  the previous word when
    deriving the current one. Plus the letters to add to this root.

     E.g. 'shock' and 'short' could be stored as figuratively 'shock2rt'
    (ascii isn't really used for either letters or count numbers)

    The range of possible letters  excludes special uppercase versions but
    includes hyphen and apostrophe. (See below for names/abbreviations.)

    The information unit in the LEX files  is  a byte, i.e. it has a range
    of 256 different possible values, much  more than needed to cover each
    of the used letters in any  (western)  language. The excess is partly,
    as you guessed, used as  codes  for  common endings, partly for common
    two-letter combinations to be used anywhere in a word (about a hundred
    of each) and partly for  the  'removal  counts' mentioned above (0-15,
    more than one code can be combined), that also - in the main LEX files
    - serve as separators  between  each  word.  Each  of the ending codes
    actually have a count/separator function  "built in", which means that
    they often  add  as  little  as  one  byte  per  word  to  the  (main)
    dictionary. (The user LEX files are  not  as tightly built as the main
    ones and each word will here take up at least 6 bytes.)

    A few codes are also used for  special functions and the last possible
    code value, 255, is used as  an  'escape' for the following byte, thus
    making available a further 255 possible  values. I haven't seen all of
    the codes used though and of the 'escaped' codes only a few. These are
    codes at the end of a  word  to  mark  it  as name or abbreviation, in
    which the first or all letters  are  required to be upper case, or/and
    the last or all letters are followed  by  a  stop. One code is used to
    add the literal ascii spelling  of  a  word  with foreign letters. For
    instance if English is used and  a  word  like say 'Mrten' is stored,
    something like 'Marten<AsciiBegin>Mrten<AsciiEnd>'  would  be written
    (where the first Marten wouldn't  be  in  ascii but encoded). An ascii
    supplement would  also  be  used  for  words  with  irregular  use  of
    uppercase letters or stops (neither  all  of  the letters nor just the
    first/last of  them  affected).  Finally,  I  made  this investigation
    before automatic hyphenation was implemented  in Protext but maybe you
    could now expect to see some  codes  used  to mark deviations from the
    general hyphenation rules.

    In the QIC files the information  unit  is  a nybble (half a byte), so
    its number of possible  values  is  only  16.  But  by  using the last
    possible value - 15 - as an  escape  code a further 15 values are made
    available for at total of 31, enough for an alphabet but with no great
    margins. (In fact all of  the  31  codes  are  used  as letters in the
    Swedish QIC file. In the English  file  'only' 28 including hyphen and
    apostrophe. The assignment is made through a table in the file header,
    a list of letters in order of frequency - most frequent letters first.
    The first 15 letters will be  assigned one-nybble codes, the rest will
    get  the  2-nybble  ones.)  To  still  manage  the  control,  the  QIC
    dictionary is divided into subsections with words of a fixed length in
    each. The program thus  always  knows  where  each  word ends and what
    nybble to interpret as count rather than  letter. And since a count of
    zero would make no sense with a fixed word length, this can be used as
    an escape for  e.g.  names/abbreviations  (using  uppercase letters or
    stops).

    But the QIC format may  not  be  ideal  for  adding and removing words
    to/from the list. The LEX files are split into 1 kb sections (spookily
    the exact size of a DOS cluster on  disk), each of which is begun with
    an uncompressed word (not deriving  from  earlier words). This word is
    also stored in  an  index  for  the  dictionary,  so  that the program
    quickly can establish in which  section  a certain word could possibly
    be found. (In the main LEX  files  this  lead word is removed from the
    dictionary itself to save a little  space.  These files also have each
    1K section divided into 8  subsections,  not  represented in the index
    and of lengths variable in bytes but fixed in number of words.)

    The user LEX files use a  random  order  for  the 1K sections, each of
    them with a header containing the number of the next and the number of
    bytes actually used in this section.  By keeping a little workspace in
    each section you would usually be  able to insert words without having
    to rearrange the data in more than  one section. Some space is wasted,
    but if you feel the  need  for  it  you  could  perhaps sometimes do a
    global reorganization.


    SOME FURTHER NOTES: If I were to  design  a dictionary (at least for a
    language with  more  inflected  noun  forms  than  English  -  and I'm
    thinking then surprisingly of  Swedish)  I  would  probably include an
    inflection class code with each word.  This could save space and, more
    important, I have when using Protext  found  it very frustrating to be
    repeatedly warned about words that I know  I have already added to the
    user dictionary -  only  in  a  differently  inflected  form. Not many
    classes are needed, one  of  which  would  be 'irregular' meaning that
    each form is stored as a separate word.

    But,  admittedly,  for  English  -  with  its  correspondence  between
    pronunciation and spelling -  the  phonetic  information stored in the
    Protext LEX files may be more  useful, especially for someone like me.
    (And thank you editor for  correcting  some  other  mistakes I make in
    this language.) This works by assigning  a range of different codes to
    one and the same letter depending on  which  sound value it has in the
    specific context (e.g. 'p' is encoded  differently when in front of an
    'h' than in  most  other  positions,  'c'  has  a  k-sound code before
    certain vowels and an s-sound code before others etc.) and each 'x' is
    represented by two codes - one  for  the  k-  and one for the s sound.
    Also, general sound character  values  are  calculated  for each word,
    based on its consonantal sounds -  and  only  6  or 7 general types of
    these are distinguished, e.g. all 'tongue explosives' (d/t/g/k-sounds)
    are equalled for this purpose. These values are encoded as two letters
    prefixing each word. (But probably will add less to the file size than
    you first think, considering how the compression is done.)

    One further special code  that  could  deserve  to  be  mentioned is a
    'consonant doubler', i.e. a code  that  is  used  in place of a second
    consonant of the same type  (except  c).  Although  this code gets its
    full meaning only in  the  phonetic  system  of  Protext  (where it is
    silent), it can possibly be of some  use without it as well. Namely as
    the first code in definitions  for  some  of the endings (and 2-letter
    codes).

    Hope that this is of some help!

    To: *.*

    Last month I got very excited about the Civilization IFF ILBM pictures
    containing 5 bitplanes and 32  colours.  Well  they  do, but on closer
    inspection the 5th plane turns out to  be cleared (in the ST version).
    Sorry about that. Similarly the  images  are  256  lines high (56 more
    than the ST  screen)  but  the  bottom  56  are  cleared.  This means,
    although the images are compressed of course, that there is about half
    a K of useless garbage in each image file.

    I still believe that  the  principle  holds  true though, that 5-plane
    images often look acceptable even when  the  5th plane is skipped. The
    reason is of course  partly  that  developers  for  both Amiga and ST,
    could save some  effort  designing  images  that  way  (I  wouldn't be
    surprised if the Amiga Civilization images are the same as the ST ones
    with the 5th plane  filled  in).  And  I  also  believe  that a common
    practice on the Amiga anyway is  to  make  the last 16 colours 'shadow
    versions' of  the  first  16.  (Maybe  the  same  principle  could  be
    considered even by 256-colour artists on the mighty Falcon.)

    GIF and TIFF

    With this message I send to Ictari  a  description of the GIF and TIFF
    picture file formats. This may  seem superfluous, considering that the
    full official documentation, upon which I have mostly drawn, should be
    available to  all  Ictari  members  anyway.  (The  TIFF  document  was
    published in Ictari 16, the GIF documents  I send to Ictari now in the
    unlikely case that you don't already have them.) I still hope there is
    some value in  a  slightly  shorter  description  written  by an Atari
    programmer for other Atari  programmers  to  read, and containing some
    experiences I made when writing depacking routines for these formats.

    Would anybody be interested  in  GIF-  and TIFF-routines to complement
    the PICPAC collection? (assembler routines in  Ictari 16). I have some
    preliminary routines ready (depacking only  at this point), which will
    probably need some polishing, though.

    */ Please do send the GIF/TIFF  routines  when they are ready. We will
    publish the GIF  document  files  over  the  next  few  issues. In our
    opinion you can never  have  TOO  much  information on programming and
    thanks also for the text compression information. Peter Hibbs /*

    --------------------------- End of file ------------------------------
