# -------------------------------------------------------------------
# BLITTER                               (c) Copyright 1996 Nat! & KKP
# -------------------------------------------------------------------
# These are some of the results/guesses that Klaus and Nat! found
# out about the Jaguar with a few helpful hints by other people, 
# who'd prefer to remain anonymous. 
#
# Since we are not under NDA or anything from Atari we feel free to 
# give this to you for educational purposes only.
#
# Please note, that this is not official documentation from Atari
# or derived work thereof (both of us have never seen the Atari docs)
# and Atari isn't connected with this in any way.
#
# Please use this informationphile as a starting point for your own
# exploration and not as a reference. If you find anything innacurate,
# missing, needing more explanation etc. by all means please write
# to us:
#	  nat@zumdick.rhein-main.de
# or
#	  kkp@gamma.dou.dk
#
# If you could do us a small favor, don't use this information for
# those lame flamewars on r.g.v.a or the mailing list.
#
# HTML soon ?
# -------------------------------------------------------------------
# $Id: blitter.txt,v 1.11 1996/01/28 20:23:19 nat Exp $
# -------------------------------------------------------------------

Incomplete.
Be wary that some stuff which sounds like we know what we're talking
about, might be just a guess w/o a check :)


The BLiTTER
-----------

The Blitter is a little different to what you're used to
on your ST (and you probably didn't get used to it very much
anyway).

You can blit a scaled pixmap to an unscaled destination
or you can blit an unscaled pixmap unto a scaled destination. Or you can
rotate the source and the destination bitmap, and in some cases you
can scale and rotate at the same time (I think scaling up and rotating
without leaving holes isn't possible)

The former will probably be the most often used. The source or the
destination can be arbitrarily 'angled' lines and need not be
contigous addresses. Furthermore you can blit pixels of 1 bit, 2 bit
4 bit 8 bit 16 bit or 32bit depth.

The Blitter in broad outline works like this:

The blitter has two channels called A1 and A2, where it reads from and 
writes data to.
A1 is the sophisticated channel allowing fractional pixel treatment
(like f.e. read pixel 1 twice, then pixel2 twice etc. for an effective
scaling of 2.0), whereas A2 is a simple channel allowing only integer
increments of the addresses. This means that A2 can only be used for
straight or diagonal lines.

Picture in your mind that a channel is pointing to a square bitmap. 
You define the width of this bitmap and the origin at which the blitter 
should start fetching data. The origin might for example be the center 
of your bitmap, or the upper left corner, you decide!.
You then define the orientation (slope) of the line the blitter should
'draw' into (or 'fetch' from) this bitmap.

            .....width.......
channel --> +----------------+
            |                |
            |  x  (origin)   | 
            |   \            |
            |    \  (slope)  |
            |                |
            +----------------+


In a real life environment you might for example use A2 as the source
of your texture, that is stored as a contigous block in memory and
A1 is used to draw an arbitrary scaled and angled line of your polygon.
Or you might use A1 to traverse the texture data at an arbitrary angle
and update the destination pixmap in a scanline fashion horizontally
left to right.

If you want to scale the bitmap you gotta figure out whether you want
to shrink or to enlarge. If you want to enlarge, you need to use A1
as the source and A2 as the destination, using fractional incrementing
on the source, if you want to shrink you want to use A1 as the destination
(also with fractional increments)


You can do a few operations concurrently while blitting your data.

If you're drawing with a single color and outputting crycolor pixels 
you can gouraudshade them at the same time at no extra cost. The 
blitter will use the intensity of the pixel and add the contents of a 
register to it (saturating add).
The contents of this register are then updated for the next pixel.
Since the update is fractional, you can achieve a smooth shaded
line with this.

You can add an intensity factor to your incoming crycolor data.

***********************************************************************
Z-BUFFER DOC IS WRITTEN WITH NO KNOWLEDGE ABOUT THE SUBJECT WHATSOEVER 
***********************************************************************
You can also use the Z-buffering capabilities of the blitter. 
Your destination data is not just an array of pixel values, but rather a 
combination of Z-data and pixel data. Consider the Z-data to be the 
third coordinate providing depth. 
The smaller the value, the nearer it is to the viewer. (usual convention)
You set up the blitter with a starting Z-value for your line and a factor 
that should be added for every pixel step, thereby possibly increasing 
or decreasing the Z-position. 

That value is then compared to the Z-data of the destination pixel. 
If the z-value (in the registers) is less than the the destination value, 
the pixel will be written - else the pixel will not be written. The 
destination pixel is then updated with the new Z-buffer value. 
The Z-data is 8 bit large ?

Unfortunately Z-buffering in detail is still unknown.

You can probably do collision detection on background colors, and
transparent blits...


Phrasemode and Pixelmode. 
------------------------

The blitter can operate either in pixelmode or in phrasemode. Phrasemode
is (in 16-bit crycolor) usually four times faster and is therefore much more
desirable. But there are some limitations that are connected to phrasemode:

o  Both A1 and A2 must work in phrasemode, you can't have one running in
   pixelmode and the other in phrasemode

o  Phrasemode implies linear address (or horizontally oriented) blits
   It looks like phrasemode doesn't work with all resolutions. So
   far only 16bit modes are known to work.

Scales and rotates aren't possible in phrasemode.
[ Please note, that you can do (non rotated) sprite scaling also with an OP
object, which might (or not) be more convenient ]

There's probably something wrong with my understanding of the blitter,
because I have a problem with the source DMA channel of the blitter.
Currently you gotta figure, that the machine takes

   time = 1 write + [4 cyles read source] + [1 cycle read destination]
   
regardless of pixelmode or phrasemode. Of course in phrasemode you usually
speed up the blitting process in 16 bit pixel mode by a factor of four.

Maybe the delay in the read is needed to get the ROM timings (32bit
organized and maybe needing one waitstate) correct. If this is rite
then there should be a bit somewhere to turn this off. But where is it ? 
It seems not to be in the CMD register.

But if the above is true this means, that the Blitter is capable of doing
approximately about:

   Gouraud pixelmode  : 13.3 / 1 = 13.3 Mio pixels / second
                              or ca. 222000 pixels / frame
   Copyblit pixelmode : 13.3 / 5 =  2.7 Mio pixels / second 
                               or ca. 44000 pixels / frame
   XOR blit pixelmode : 13.3 / 6 =  2.2 Mio pixels / second
                               or ca. 37000 pixels / frame

and in phrasemode (16 bit pixels)

   Gouraud phrasemode : 13.3 * 4 / 1 = 53.2 Mio pixels / second
                                  or ca. 887000 pixels / frame
   Copyblit pixelmode : 13.3 * 4 / 5 =  2.7 Mio pixels / second 
                                  or ca. 177000 pixels / frame
   XOR blit pixelmode : 13.3 * 4 / 6 =  2.2 Mio pixels / second
                                  or ca. 148000 pixels / frame


B_CMD:
----------
 32       28        24        20       16       12        8        4        0
  +-+------^--------+^--------+^------+-^-----+--^--+-----^----+---^--+-----+
  | |   control     |   OP    | z-op  |  ity  | mode|A1ctl|misc| dst  | src |
  +-+---------------+---------+-------+-------+-----+-----+----+------+-----+
         30..25       24...21   20..18 17..14  13.11 10..8 7..6  5..3  2..0

Writing into the lower word activates the blitter! Reading this register
gives the blitter status (bit #0 == blitter idle if set)


src:
        bit 0: SRCEN   source data read enable
        bit 1: SRCENZ  source Z-data read enable
        bit 2: SRCENX  source extra data read enable

   With this set of bits you tell the blitter what kind of
   data accesses it needs to perform. It can not figure it out from the
   way the other command bits are set and conclude what it needs to
   do, you have to instruct the blitter yourself. If you're doing
   straight copies from memory to memory, you will want to set bit0.
   If you're using the Z-buffer capabilities you'd want to set bit1
   as well.
   If your source data spans more phrases than the destination data
   then you need to set bit2 to tell the blitter to do that extra 
   phrase read.

dst:
        bit 3: DSTEN   destination data read enable
        bit 4: DSTENZ  destination Z-data read enable
        bit 5: DSTWRZ  destination Z write enable

   You'd want to set bit3, if you're doing read-write-modify cycles on
   the destination, say f.e. you're doing DST &= SRC or somesuch. Else
   you should clear this bit (or pay the price in speed decrease).
   Likewise if you're not going to do Z-buffer blitting keep bits 4 and 5
   clear, else set'em!
   Note that you can not disable 'destination write' because, you'd just
   not use the Blitter in this case, right ?

misc:
        bit 6: CLIP_A1 enable A1 clipping
        bit 7:         unused

   You can clip the pixmap that is handled with the A1 register set.
   If this bit is set, then the information in the A1_CLIP register is
   used to clip the A1 lines. See A1_CLIP for more information about
   clipping.

A1-control (A1ctl):
        bit 8: UPDA1   enable A1 update step fraction part
        bit 9: UPDA1F  enable A1 update step integer part
        bit10: UPDA2   enable A2 update step

  You hint the Blitter here, which step registers it should update.
  If you're just doing linedrawings you don't need any of these bits
  set. Only when you're blitting in two dimensions you need to consider
  these bits. The idea behind them is probably not that you can
  improve the blitter performance but rather the setup performance,
  since you know which registers change and which not and need not
  update all of them for consecutive blits.

mode:
        bit11: DSTA2   use A2 as destination
        bit12: GOURD   enable Gouraud shading
        bit13: ZBUFF   enable Z-buffer handling

   Usually (bit11 cleared) you use A2 as the source and A1 as the
   destination. You can reverse the roles by setting this bit.
   Set bit12 to enable Gouraudshading. Gouraudshading will only
   be "gouraud shading" if used on crycolor data. Use the intensity
   counters/incrementers to specify the shading (see B_IINC for
   further reference)
   With bit13 you enable Z-buffer handling (look for the A1_FLAGS for
   a small description of Z-buffer handling).


intensity (ity) and other stuff:
        bit14: TOPBEN   carry into nybble
        bit15: TOPNEN   carry into byte
        bit16: PATDSEL  use pattern data (instead of source)

  Bit14 and 15 will all be explained in the gouraud shading description
  coming up soon. You can control with the bit 14 + 15 where the overflow
  from the intensity addition should be stored (added to)
  On a completely different note, if you just want to initialize a
  memory region (or draw a line) in a single color, you don't need to
  read the source data from memory. You can let the blitter pull the
  color from one of its own registers (B_PATD). This saves you on
  the average a read cycle for every phrase written, which is a good
  thing. None of the logical blitter operations apply when using the 
  pattern data register. You can't XOR your bitmap with the pattern data!

z-op:
        bit18-20:

        bit18: ZMODELT  source < destination
        bit19: ZMODEEQ  source = destination
        bit20: ZMODEGT  source > destination

        or

        0:      unused
        1:      src < dst
        2:      src == dst
        3:      src <= dst
        4:      src > dst
        5:      src != dst
        6:      src >= dst
        7:      unused

   You can tell the blitter how to decide, whether the source data should
   overwrite the destination pixel or not when using the Z-buffer mode.
   Usually you will want to put a 3 or a 1 here, so that you're
   'nearer' pixels overwrite the 'farther' pixels. (Assuming that your
   Z-buffer values are the higher, the farther away from the viewer
   they are)

OP:     logical operation the Blitter should perform

        bit21: LFU_NAN  ! source & ! destination
        bit22: LFU_NA   ! source &   destination
        bit23: LFU_AN     source & ! destination
        bit24: LFU_A      source &   destination

   or

         0: LFU_ZERO     DST = 0              (LFU_CLEAR)
         1: LFU_NSAND    DST = ! SRC & ! DST
         2: LFU_NSAD     DST = ! SRC & DST
         3: LFU_NOTS     DST = ! SRC
         4: LFU_SAND     DST = SRC & ! DST
         5: LFU_NOTD     DST = ! DST
         6: LFU_N_SXORD  DST = ! (SRC ^ DST)
         7: LFU_NSORND   DST = ! SRC | ! DST
         8: LFU_SAD      DST = SRC & DST
         9: LFU_SXORD    DST = SRC ^ DST     (LFU_XOR)
        10: LFU_D        DST = DST
        11: LFU_NSORD    DST = ! SRC | DST
        12: LFU_S        DST = SRC           (LFU_REPLACE)
        13: LFU_SORND    DST = SRC | ! DST
        14: LFU_SORD     DST = SRC | DST
        15: LFU_ONE      DST = 1

   Just as on the Atari ST blitter you can have the usual set of
   logical operations you can perform on your data. Use 12 for your
   copying blits and 0 for your single color initilization. Note that
   if you set bit16 (use pattern data), then the blitter will NOT
   zero your buffer with OP==0, but fill it with the pattern color
   instead.
   The opcodes are ignored when bit16 is set.

control:
      bit25: CMPDST   compare destination pixel with pattern pixel
      bit26: BCOMPEN  bit compare write inhibit
      bit27: DCOMPEN  data compare write inhibit
      bit28: BKGWREN  unknown
      bit29: BUSHI    hog the bus
      bit30: SRCSHADE source shading

   bit25:   If you enable this the destination pixel (that will be 
   overwritten) is compared with the value stored in the pattern-data 
   register (B_PATD). 
   If you enable this in conjuction with B_STOP this _maybe_ is used as 
   a way to do hardware collision detection. (like in GTIA on the Atari
   8 bit)

   bit26:   speculation: The lower 8 bit of the source value are examined.
   If all bits are 0 then nothing will be written back, if all of them
   are set then everything will be written back. Now what happens if there
   are just a few bits set ? 
   Imagine that the pixels of the destination pixmap are numbered from
   7 to 0 wrapping at -1 back to 8.
      
     Start of line blit
        \                    
         7654321076543210765
               ^
               current pixel position     

     Source pixel value:   0xFF55  ->  11111111 01010101

         76543210
         01010101
               ^
   
     So this pixel value will not be written.
   
   Don't ask me what that might be good for.   

   
   bit27:   used in conjunction with bit25. If you set bit25 and bit27 the
   effect will be that only those destination values will be overwritten 
   that do not match the value stored in B_PATD. 
   So if you put the color 0x0000 into B_PATD only those pixels will be 
   written, where there are not zerovalued pixels in the destination bitmap. 
   You should have DSTEN on!

   bit28:   still no idea yet

   bit29:   seems to let the blitter hog the bus completely. This is not such
   a good idea for extensive blits, since apparently the OP is also shut off
   and you'll see garbage on the screen. For small blits this might yield
   an overall system performance increase, when you're pushing the machine
   to its limits.

   bit30:   Enable source shading. Yes it does work, although the setup is
   a bit weird because you seem to have to set bit3 (destination read enable)
   for real source shading to happen. Put the shade value into B_IINC. Looks
   really cool.
   You can get some funky albeit as yet unpredictable (?) effects putting
   a value in B_DSTD and disabling the destination read. F.e. put the 
   B_IINC to $40000 and blit repeatedly incrementing B_DSTD (and delaying
   a little between blits). It's psychedelic!


B_COUNT:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |                n_lines              |               n_pixels            |
  +-------------------------------------+-----------------------------------+
                   16 bit                               16 bit

   n_pixels:   number of pixels to draw in a line
   n_lines:    number of lines to draw 

   You need to draw at least one line of size one pixel. After n_pixels are 
   drawn the STEP registers are applied to the current pixel position and
   blitting resumes.


B_IINC + B_I0-3:
---------------

 32       28        24        20       16       12        8        4        0
  +--------+---------+---------^--------+--------^--------^--------^--------+
  |chroma.i|chroma.f |  intensity.i     |             intensity.f           |
  +--------+---------+------------------+-----------------------------------+
     4 bit    4 bit          8 bit                       16 bit

  chroma.i:             delta for chroma change, integer part
  chroma.f:             delta for chroma change, fractional part
  intensity.i           delta for gouraud shading, integer part
  intensity.f:          delta for gouraud shading, fractional part
  
  This register is used for chroma changes and gouraud shading (or 
  both operations together). Chroma changes are like gouraud shades,
  but with an intensity delta of zero. Pure gouraud shadings have a chroma
  delta value of zero.
  This register is added to either B_PATD in pixelmode or B_I0 to B_I3 in 
  phrasemode. The intensity is saturation added, meaning that you can't 
  have an intensity wrap around. The chroma change on the other hand does
  wrap around. The integer part is by the way sign extended.
  So normally chroma and intensity are two seperate entities that don't 
  influence each other. If you want you can set in the B_CMD register
  either TOPBEN or TOPNEN. 
  If you set TOPNEN then the carry of the saturation add will be added to
  the upper nyblle (chroma.i) of the current source data value.
  If you set TOPBEN then there will be _no_ saturation for the addition 
  of the intensity delta. Instead the carry is added to the top byte of 
  the current source data value. 
  
  B_I0, B_I1, B_I2, B_I3 are used in phrasemode instead of B_IINC, 
  which used in pixelmode.


B_DSTD:
------
 32       28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^--------+
0 |                                 pixelvalue                              |
  +-------------------------------------------------------------------------+

 64       60        56        52       48       44       40       36        32
  +--------^---------^---------^--------^--------^--------^--------^--------+
1 |                                 pixelvalue                              |
  +-------------------------------------------------------------------------+
   
  pixelvalue:
  
  If you're doing RMW-cycles with the blitter and have not enabled data
  reads, then this register will be used as input for the logical operations
  instead.
  
  Depending on the blittermode (pixelmode or phrasemode) there is either
  only one pixel kept in here (phrase 0) rightside aligned by the way,
  or as many pixels that can fit in a phrase.
  
  Experiments show that the value in DSTD is NOTed before being used as
  a logical operation. Curious.
  
  f.e.
	move.l	#(1<<16)|WIDTH,B_COUNT
	move.l	#PITCH1|PIXEL16|WID320|XADDPIX,A1_FLAGS
	move.l	#PITCH1|PIXEL16|WID320|XADDPIX,A2_FLAGS
  	move.l	#$00000FFFF,B_DSTD
	move.l	#SRCEN|LFU_XOR,d0

	is actually a straight replacement, although one would expect
	
		S ^ 0xFFFF to yield ~S and not S
  
	

B_SRCD:
------
 32       28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^--------+
0 |                                 pixelvalue                              |
  +-------------------------------------------------------------------------+

 64       60        56        52       48       44       40       36        32
  +--------^---------^---------^--------^--------^--------^--------^--------+
1 |                                 pixelvalue                              |
  +-------------------------------------------------------------------------+
   
  pixelvalue:
  
   This is probably just the same as B_DSTD but for those case when you
   did not have source read enabled (bit #0 of the CMD register) and
   when you haven't selected the pattern as the source of your blit.



B_STOP:
------
  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^-+------+
  |                                  unused                          | stop |
  +------------------------------------------------------------------+------+

   stop: 
      bit 0:
      bit 1:
      bit 2:

   Uses 3 bits to resume or stop after a write inhibit occures. Inhibit 
   will occure when painting pixel-pixel mode, Xadd=1, BKGWREN=0, and one 
   of the BCOMPEN, DCOMPEN or Zmodem0-2 are set, with matching conditions.

   ?????????????????????????


A1_BASE:
-------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^--------+
  |                                  address                                |
  +-------------------------------------------------------------------------+

   address:

   Pointer to the bitmap. The bitmap must (probably) be phrase aligned.
   For pixel positioning use A1_PIXEL



A1_FLAGS:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^--------+---------+--+-----^-----+--^---+----^-+------+
  |            unused         | addctl  |  |   width   | z-off| depth| pitch|
  +---------------------------+---------+--+-----------+------+------+------+
                                20...16       14...9     8..6    5..3  2..0
                                          ^
pitch:                                    +---- unused 
      bit0-bit2: 

      0: 1 phrase
      1: 2 phrases
      2: 4 phrases
      3: 8 phrases

   The amount of phrases the blitter should add to the address when
   accessing the next phrase. Usually set to zero eh ?

depth:
      bit3-bit5:

         colors         bitplanes           bits
       ------------+-------------------+-------------
      0: 2                 1                 1
      1: 4                 2                 2
      2: 16                4                 4
      3: 256               8                 8
      4: 32768/65536       16/CrY           16
      5: 16 mio            24               32
      6: unused
      7: unused

   The pixel size the blitter should move. Remember all pixels on the
   Jaguar are chunky (meaning the bits to a pixel are adjacent, not like
   on the Amiga or the ST)

z-offset (z-off):

      bit6-bit8 gives the number of phrases the Z-data is offset from
      your pixel phrase. Or so...
      Apparently 0 and 7 are unusable values

width:
   bit9-14:

      This is the width in pixels of a scanline of the area pointed
      to by A1. Or in different words A1 points to a rectangular
      block of pixels. The pixels are organized in horizontal strips.
      You give the width of such a strip with this value.

      The number is not an integer value but rather a floating point
      value (no kidding). It is made up like this:

      1.[bit14-13] * 2^[bit12-9] so for example

      01 0101   would be 1.25 * 2^5 = 40     or
      10 1000   would be 1.5  * 2^8 = 384

         (1.00bin -> 1.00dec  1.01bin -> 1.25dec  
          1.10bin -> 1.5dec   1.11bin -> 1.75dec)

                  or you can think of it as:
      
      x   = 1 << [bit12-9] 
      res = x + (bit14 ? (x >> 1) : 0) + (bit15 ? (x >> 2) : 0);
      
      01 0101   would be 
         x   = 1 << 5                              /* 32 */
         res = 32 + (0 ? 16 : 0) + (1 ? 8 : 0);    /* 40 */

      Some often used values are:

          value   width    value   width    value   width
         -------+-------  -------+-------  -------+-------
            4       2        8       4       10       6
           12       8       13      10       14      12
           15      14       16      16       17      20
           18      24       19      28       20      32
           21      40       22      48       23      56
           24      64       25      80       26      96
           27     112       28     128       29     160
           30     192       31     224       32     256
           33     320       34     384       35     448
           36     512       37     640       38     768
           39     896       40    1024       41    1280
           42    1536       43    1792       44    2048
           45    2560       46    3072       47    3584

         
adding control (addctl)

** please note that the bit descriptions are as an exception
** interleaved

   Xadd control
   bit16-17:
      0: XADDPHR   add phrase offset to X and truncate
      1: XADDPIX   add pixelsize (1) to X
      2: XADD0     add zero (for those nice vertical lines)
      3: XADDINC   add the contents of the increment register

   bit19: XSIGNADD/XSIGNSUB  pixel add operation, 0 = add 1 = subtract
                             when using "add pixelsize" mode

   If you don't set any of these bits (0) then you are using the blitter
   in phrase mode. That means that pixels are grabbed in lots of phrases
   updated concurrently (one step) and written back in lots of phrases.
   (lots as in "quantitysize", not as "many"). Obviously you can use the
   phrasemode only for horizontal line blitting operations. Else you
   need to put the Blitter in pixel mode (in CryMode ~4x slower).


   Yadd control
   bit18: YADD0/YADD1        add zero (clear) or one (set) to Y
   bit20: YSIGNADD/YSIGNSUB  add 1/sub1 to Y (when bit18 is set)



A1_CLIP:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |                height               |               width               |
  +-------------------------------------+-----------------------------------+

   Height is the height and the width that the blitter should clip at
   starting from the base. It does work but seems to be buggy. (i.e. 
   sometimes clips one pixel to early)
   

A1_PIXEL:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |                 Y.i                 |               X.i                 |
  +-------------------------------------+-----------------------------------+
   
   X.i:         horizontal position (integer part)
   Y.i:         likewise vertical pixel offset 
   
        horizontal and vertical pixel offset from A1_BASE where the 
        blitting operation should begin. Note that for the calculation
        of the proper address offset, the blitter needs to know the
        pixel size and the width of one line
        

A1_FPIXEL:
---------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |                 Y.f                 |                X.f                |
  +-------------------------------------+-----------------------------------+

   X.i:         horizontal position (fractional part)
   Y.i:         likewise vertical pixel offset 

  You can position the pixel value at a fractional pixel value using this
  register. If you're using fractional steping rates, this register will
  be updated as well as the integer A1_PIXEL register.

  Guess: 0.FFFF will still address pixel 0 and will not round up
  to 1.


A1_INC:
------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |               unused                |          increment.i              |
  +-------------------------------------------------------------------------+

  increment.i:          integer delta added to pixel position
  
     offset to add to A1_PIXEL after a pixel has been blitted.
     this register is used only, if a certain addressing mode 
     (bit16... of A1_FLAGS) is used.
     Please also note the update A1 bits in the blitter command
     for proper operation.
             
     The size is not yet known, but believed to be 16 bit
   

A1_FINC:
-------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |               unused                |            increment.f            |
  +-------------------------------------+-----------------------------------+

  increment.f:          fractional delta added to pixel position
         As above but this is the fractional part of the stepper.
         

A1_STEP:
-------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |               y_step.i              |               x_step.i            |
  +-------------------------------------+-----------------------------------+

  x_step.i:     value to be added to A1_PIXEL.x
  y_step.i:     value to be added to A1_PIXEL.y
  
  Values added to the pixelpointer after a line has been drawn.
  You must set a bit in the control register to allow this update to happen.
  

A1_FSTEP:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |               y_step.f              |               x_step.f            |
  +-------------------------------------+-----------------------------------+

  x_step.f:     value to be added to A1_FPIXEL.x
  y_step.f:     value to be added to A1_FPIXEL.y
  
  Values added to the pixelpointer after a line has been drawn.
  You must set a bit in the control register to allow this update to happen,
  which is different than the control bit used for integer step updates!




A2_BASE:
-------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------^--------^--------^--------^--------+
  |                                  address                                |
  +-------------------------------------------------------------------------+

   See A1_BASE



A2_FLAGS:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^--------+^--------+--+-----^-----+--^---+----^-+------+
  |            unused         | addctl  |  |   width   | z-off| depth| pitch|
  +---------------------------+---------+--+-----------+------+------+------+
                                20...16       14...9     8..6    5..3  2..0

   See A1_FLAGS



A2_PIXEL:
--------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |                 Y.i                 |               X.i                 |
  +-------------------------------------+-----------------------------------+

   See A1_PIXEL
        

A2_MASK:
------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |               mask.y                |              mask.x               |
  +-------------------------------------+-----------------------------------+

   mask.x:              x modulo value
   mask.y:              y modulo value

   A2_MASK is probably used to mask off the pixel value of A2 creating
   thereby a circular buffer


A2_STEP:
-------

  32      28        24        20       16       12        8        4        0
  +--------^---------^---------^--------+--------^--------^--------^--------+
  |               y_step.i              |               x_step.i            |
  +-------------------------------------+-----------------------------------+

   See A1_STEP

  


BUGS:
-----

It seems that when the blitter is done with blitting it is so happy that
it's done, it forgets to update the PIXEL registers with the STEP 
registers (UPDA1 UPDA2 UPDA1F).

Therefore this:

   move.l   #$00400040,B_COUNT
   move.l   #VALUE,B_CMD

        is not equivalent to this

   moveq    #$3F,d1
.loop:
   move.l   #$00010040,B_COUNT
   move.l   #VALUE,B_CMD
.wait:
   move.l   B_CMD,d0
   ror.w    #1,d0
   bcc.b    .wait
 
   dbra     d1,.loop
  


Clipping seems to be buggy, occasionally clipping too early. (??)


DISCUSSION:
----------

The designers of the Jaguar thought a lot about what makes a good
blitter and not. Important for a good blitter is that the setup time
is minimal. One of the reasons the Atari ST Blitter wasn't that successful
was that in the time needed to setup the chip - and you did have to 
setup and calculate quite a lot of values - you were typically just 
about done doing the blit in software (a small exaggeration).

The Jaguar blitter needs minimal setting up, because the hardware does a
lot of the calulations for you. For doing a memory clear f.e. you need to
write only five registers:

   A1_BASE
   A1_PIXEL
   A1_FLAGS
   A1_CMD
   B_PATD


With a little imagination you will find a lot of uses for this nice
piece of hardware, that go way outside of just drawing and filling.
