 Post: 1773 of 1775
 Xref: tsoft alt.binaries.pictures.d:1357
alt.binaries.pictures.erotica.d:241
 alt
 .sex.pictures.d:1773
 From: tgl+@cs.cmu.edu (Tom Lane)
 Newsgroups:
 alt.binaries.pictures.d,alt.binaries.pictures.erotica.d,alt.sex.pict
 ures.d
 Subject: All about JPEG (again)
 Summary: trying to clear up some of the confusion
 Keywords: JPEG, image compression, FAQ
 Date: 5 Nov 91 16:07:05 GMT
 Followup-To: alt.binaries.pictures.d
 Organization: School of Computer Science, Carnegie Mellon
 Lines: 376
 Nntp-Posting-Host: g.gp.cs.cmu.edu
 Originator: tgl@G.GP.CS.CMU.EDU
 [This is a repost, with some minor changes, of a posting I made October
22.
 An awful lot of people on these groups evidently didn't read it then.
 In particular, the end of the article tells where to get JPEG software.]
 Recent posts have made it clear that some folks are still in the dark
about
 what JPEG is, while others think they know what it is but are harboring
 misconceptions.  Herewith is some authoritative (I hope) information
about
 what JPEG can and can't do, where you can get software for it, etc. etc.
 It may be worth turning this into a FAQ file.  Suggestions for additions
 and clarifications would be welcome.
 This article includes the following sections:
 1.  What is JPEG?
 2.  Why use JPEG?
 3.  How well does it work?
 4.  What about lossless JPEG?
 5.  What's all this hoopla about color quantization?
 6.  When should I use JPEG, and when should I stick with GIF?
 7.  How does JPEG work?
 8.  Why all the argument about file formats?
 9.  And what's all this about arithmetic coding?
 10.  Where can I get JPEG software?
 Sections 4-9 can be skipped unless you are interested in details.
 1.  What is JPEG?
 JPEG is a standardized image compression mechanism.  JPEG stands for
Joint
 Photographic Experts Group (the original name of the committee that wrote
 the standard).  JPEG is designed for compressing either full-color or
 gray-scale digital images of "natural" (real-world) scenes.  JPEG does
not
 handle black-and-white (1-bit-per-pixel) images, nor does it handle
motion
 picture compression.  (There are related committees, JBIG and MPEG
 respectively, working on standards for compressing those types of
images.)
 JPEG is "lossy", meaning that the image you get out of decompression
isn't
 quite identical to what you put in.  The algorithm achieves much of its
 compression by exploiting known limitations of the human eye; notably,
the
 fact that small color details aren't perceived as well as small details
of
 light-and-dark.  Thus, JPEG is intended for compressing images that will
be
 looked at by humans.  If you plan to machine-analyze your images, the
small
 errors introduced by JPEG may be a problem for you, even if they are
 invisible to the eye.
 A useful property of JPEG is that the degree of lossiness can be varied
by
 adjusting compression parameters.  This means that the image maker can
trade
 off file size against output image quality.  You can make *extremely*
small
 files if you don't mind poor quality; this is useful for indexing image
 archives, making thumbnail views or icons, etc. etc.  Conversely, if you
 aren't happy with the output quality at the default compression setting,
you
 can jack up the quality until you are happy, and accept lesser
compression.
 2.  Why use JPEG?
 Basically, to make your image files smaller.  This is a big win for
 transmitting files across networks and for archiving libraries of images.
 Being able to compress a 2 Mbyte full-color file down to 100 Kbytes or so
 makes a big difference in disk space or transmission time!  (If you are
 comparing GIF and JPEG, the size ratio is more like four to one.  More
 details below.)
 Unless your viewing software supports JPEG directly, you'll have to
convert
 JPEG to some other format for viewing or manipulating images.  Thus,
using
 JPEG is essentially a time/space tradeoff: you give up some time in order
to
 store or transmit an image more cheaply.
 It's worth noting that when network or phone transmission is involved,
the
 time savings from transferring a shorter file can be much greater than
the
 extra time to decompress the file.  I'll let you do the arithmetic
yourself.
 3.  How well does it work?
 Pretty darn well.  Here are some sample file sizes for an image I have
handy,
 a 727x525 full-color image of a ship in a harbor.  The first three files
are
 for comparison purposes; the rest were created with the free JPEG
software
 described at the end of this file.
 File       Size in bytes                Comments
 ship.ppm        1145040  Original file in PPM format (no compression)
 ship.ppm.Z       963829  PPM file passed thru Unix compress
                          compress doesn't accomplish a lot, you'll note.
 ship.gif         240438  Converted to GIF with ppmquant -fs 256 |
ppmtogif
                          Most of the savings is the result of losing
color
                          info: GIF saves 8 bits/pixel, not 24.  (See sec.
5.)
 ship.jpg100      315600  cjpeg -Q 100   (highest quality setting)
                          This is indistinguishable from the 24-bit
original,
                          at least to my nonprofessional eyeballs.
 ship.jpg75        57995  cjpeg -Q 75    (default setting)
                          You have to look mighty darn close to
distinguish this
                          from the original, even with both on-screen at
once.
 ship.jpg50        38399  cjpeg -Q 50
                          This has slight defects; if you know what to
look
                          for, you could tell it's been JPEGged without
seeing
                          the original.  Still at or above the quality of
                          typical recent postings in Usenet pictures
groups.
 ship.jpg25        25186  cjpeg -Q 25
                          Visible blockiness (djpeg -b helps some).  Much
                          higher quality than a GIF of comparable size,
though.
 ship.jpg5o         6597  cjpeg -Q 5 -o
                          Blocky, but perfectly satisfactory for preview
or
                          indexing purposes.
 In this case JPEG can make a file that's a factor of four or five smaller
 than a GIF of comparable quality.  This seems to be a typical ratio for
 real-world scenes.
 GIF does significantly better on images with only a few distinct colors,
 such as cartoons or line art.  JPEG can't squeeze these files as much as
GIF
 does without introducing highly visible defects.  This sort of image is
best
 left in GIF form.
 4.  What about lossless JPEG?
 There's a great deal of confusion on this subject.  The JPEG committee
did
 define a truly lossless compression algorithm, i.e., one that guarantees
the
 final output is bit-for-bit identical to the original input.  However,
this
 lossless mode has almost nothing in common with the regular, lossy JPEG
 algorithm.  As far as I know, the lossless JPEG mode is not implemented
in
 any software available to the public.
 Saying "-Q 100" to the free JPEG software DOES NOT get you a lossless
image.
 What it does get rid of is deliberate information loss in the coefficient
 quantization step.  There is still a good deal of information loss in the
 color subsampling step.  (There should be a command line switch to
disable
 subsampling, but as of today, there isn't one.)
 Even with both quantization and subsampling turned off, the standard JPEG
 algorithm is not truly lossless, because it is subject to roundoff errors
in
 various calculations.  The maximum error is a few counts in any one pixel
 value; it's highly unlikely that this could be perceived by the human
eye,
 but it might be a concern if you are doing machine processing of an
image.
 At this minimum-loss setting, standard JPEG produces files that are
perhaps
 half the size of an uncompressed 24-bit-per-pixel image.  JPEG's true
 lossless mode is reputed to provide roughly the same amount of
compression.
 Those in the know do not regard this as state-of-the-art performance for
 lossless image compression; if you need lossless compression, you may be
 well advised to wait for the upcoming JBIG standard.
 5.  What's all this hoopla about color quantization?
 Most people don't have full-color (24 bit per pixel) display hardware.
 Typical display hardware stores 8 or fewer bits per pixel, so it can
display
 256 or fewer distinct colors at a time.  To display a full-color image,
the
 computer must map the image into an appropriate set of representative
 colors.  This process is called "color quantization" (not to be confused
 with the coefficient quantization done internally by JPEG).
 Clearly, color quantization is a lossy process.  It turns out that for
most
 images, the details of the color quantization algorithm have MUCH more
impact
 on the final image quality than do any errors introduced by JPEG (except
at
 the lowest JPEG quality settings).
 Since JPEG is inherently a full-color format, converting a JPEG image for
 display on 8-bit-or-less hardware requires color quantization.  A GIF
image,
 by definition, has already been quantized to 256 or fewer colors.  For
 purposes of Usenet picture distribution, GIF has the advantage that the
 sender precomputes the color quantization and recipients don't have to.
 This is also the *disadvantage* of GIF: you're stuck with the sender's
 quantization.  If the sender quantized to a different number of colors
than
 what you can display, you have to re-quantize, resulting in much poorer
 image quality than if you had quantized once from a full-color image.
 Furthermore, if the sender didn't use a high-quality color quantization
 algorithm, you're out of luck.
 For this reason, JPEG offers the promise of *significantly better* image
 quality for all users whose machines don't match the sender's display
 hardware.  JPEG's full color image can be quantized to precisely match
the
 user's display hardware.  Furthermore, you will be able to take advantage
of
 future improvements in quantization algorithms (there is a lot of active
 research in this area), or purchase better display hardware, to get a
better
 view of JPEG images you already have.  With GIF, you're stuck forevermore
 with what was sent.
 It's also worth mentioning that many GIF-viewing programs include rather
 shoddy quantization routines.  If you view a 256-color GIF on a 16-color
EGA
 display, for example, you are probably getting a much worse image than
you
 need to.  This is partly an inevitable consequence of doing two color
 quantizations (one to create the GIF, one to display it), but often it's
 also due to sloppiness.  JPEG conversion programs will be forced to use
 high quality quantizers in order to get acceptable results at all, and in
 normal use they will quantize directly to the number of colors to be
 displayed.  Thus, JPEG is likely to provide better results than the
average
 GIF program for low-color-resolution displays as well as high-resolution
ones!
 The same considerations apply to gray-scale images, although quantization
of
 gray scale is a much simpler problem.
 (Incidentally, the current "V1" release of the free JPEG software does
NOT
 include a good color quantizer; we assume you have ppmquant from the
PBMPLUS
 package.  For this reason we don't recommend using the free software as a
 JPEG->GIF converter.  This shortcoming will be fixed in the next
release.)
 6.  When should I use JPEG, and when should I stick with GIF?
 For the reasons discussed above, JPEG is superior to GIF for storing and
 distributing full-color and gray-scale images of "realistic" scenes.
 JPEG is superior even if you don't have 24-bit display hardware, and it
is
 a LOT superior if you do.
 GIF remains the superior format for cartoons, line drawings, and some
other
 types of "non-realistic" images.  JPEG is not designed for good
performance
 on this kind of image.
 If you have an existing library of GIF images, you may wonder whether you
 should convert it to JPEG.  You will lose some image quality if you do
so,
 but the disk space savings may justify converting anyway.  (The preceding
 section, which argued that JPEG image quality is superior to GIF, only
 applies if both formats start from a full-color original.  If you start
from
 a GIF, you've already irretrievably lost a great deal of information;
JPEG
 can only make things worse.)
 Experience to date suggests that large, high-quality GIFs are the best
 candidates for conversion to JPEG.  They chew up the most storage so
offer
 the most savings, and they convert to JPEG with minimum visible
degradation.
 (Generally, JPEG won't compress low-quality input images as well as
 high-quality ones.)  Don't waste your time converting any GIF much under
100
 Kbytes.  Also, don't expect JPEG files converted from GIFs to be as small
as
 those created directly from full-color originals.  For comparable quality
 you may have to let the converted files be as much as twice as big as
 straight-through JPEG files would be (i.e., shoot for 1/2 or 1/3rd the
size
 of the GIF file, not 1/4th as shown in the earlier comparisons).
 7.  How does JPEG work?
 The buzz-words to know are chrominance subsampling, discrete cosine
 transforms, coefficient quantization, and Huffman or arithmetic entropy
 coding.  This article's long enough already, so I'm not going to say more
 than that.  For a good technical introduction, see Wallace's article in
the
 April 1991 Communications of the ACM.
 8.  Why all the argument about file formats?
 Strictly speaking, JPEG refers only to a family of compression
algorithms;
 it does *not* refer to a specific image file format.  The JPEG committee
was
 prevented from defining a file format by turf wars within the
international
 standards organizations.
 Since we can't actually exchange images with anyone else unless we agree
on
 a common file format, this leaves us with a problem.  In the absence of
 official standards, a lot of JPEG program writers have just gone off to
 "do their own thing", and as a result their programs aren't compatible
with
 anybody else's.
 The closest thing we have to a de-facto standard JPEG format is some work
 that's been coordinated by people at C-Cube Microsystems.  They have
defined
 two JPEG-based file formats:
   * JFIF (JPEG File Interchange Format), a "low-end" format that
transports
     pixels and not much else.
   * TIFF/JPEG, an extension of the Aldus TIFF format.  TIFF is a "high-
end"
     format that will let you record just about everything you ever wanted
to
     know about an image, and a lot more besides :-).  TIFF is a lot more
     complex than JFIF, and may well prove less transportable, because
     different vendors have historically implemented slightly different
and
     incompatible subsets of TIFF.  It's not likely that adding JPEG to
the mix
     will do anything to improve this situation.
 Both of these formats were developed with input from all the major
vendors
 of JPEG-related products; it's reasonably likely that future commercial
 products will adhere to one or both standards.  (However, as of right
now,
 October 1991, it's too early for many such products to have appeared.)
 A particular case that people may be interested in is Apple's QuickTime
 software for the Macintosh.  QuickTime uses a JFIF-compatible format
wrapped
 inside the Mac-specific PICT structure.  Conversion between JFIF and
 PICT/JPEG should be pretty straightforward; in fact Apple may release a
 utility for the purpose.
 I believe that Usenet should adopt JFIF as the replacement for GIF in
 picture postings.  JFIF is simpler than TIFF and is available now; the
 TIFF/JPEG spec is still being hammered out.  Even when TIFF/JPEG is
 available, the JFIF format is likely to be a widely supported "lowest
 common denominator"; TIFF/JPEG files may never be as transportable.
 9.  And what's all this about arithmetic coding?
 The JPEG spec defines two different "back end" modules for the final
output
 of compressed data: either Huffman coding or arithmetic coding is
allowed.
 The choice has no impact on image quality, but arithmetic coding usually
 produces a smaller compressed file.  On typical images, arithmetic coding
 produces a file 5 or 10 percent smaller than Huffman coding.  (The
numbers
 previously cited are all for Huffman coding.)
 Unfortunately, the particular variant of arithmetic coding specified by
the
 JPEG standard is subject to patents owned by IBM, AT&T, and Mitsubishi.
 Thus *you cannot legally use arithmetic coding* unless you obtain
licenses
 from these companies.  (The "fair use" doctrine allows people to
implement
 and test the algorithm, but actually storing any images with it is
dubious
 at best.)
 At least in the short run, I recommend that people not worry about
 arithmetic coding; the space savings isn't great enough to justify the
 potential legal hassles.  In particular, arithmetic coding *should not*
 be used for any images to be exchanged on Usenet.
 There is some small chance that the legal situation may change in the
 future.  Stay tuned for further details.
 10.  Where can I get JPEG software?
 Free, portable C code for JPEG compression is available from the
Independent
 JPEG Group, which I lead.  A package containing our C source code,
 documentation, and some small test files is available from several
places.
 The "official" archive site for this source code is uunet.uu.net
(137.39.1.2
 or 192.48.96.2).  Look under directory /graphics/jpeg; the current
release
 is jpegsrc.v1.tar.Z.  (This is a compressed TAR file; don't forget to
 retrieve in binary mode.)  You can retrieve this file by FTP or UUCP.
 Folks in Europe may find it easier to FTP from nic.funet.fi (see
directory
 pub/graphics/programs/jpeg).  The source code is also available on
 CompuServe, in the GRAPHSUPPORT forum (GO PICS), library 14, as
jpsrc.zip.
 This software has been tested on numerous Unix machines, PCs, Macs, and
 Amigas; we believe it can be ported to almost any machine that has a
 (reasonable) C compiler.
 We consider this to be a preliminary release.  The current software only
 handles conversion between JPEG and PBMPLUS image formats, so it must be
 used in conjunction with Jef Poskanzer's free PBMPLUS software.  (Well,
 actually it can read and write GIF files too, but writing GIF files
doesn't
 work very well yet.)  Some operations will run out of memory on PCs and
 other non-virtual-memory machines.  These and other shortcomings will be
 fixed in future releases.
 We have released this software for both noncommercial and commercial use.
 Companies are welcome to use it as the basis for JPEG-related products.
 We do not ask a royalty, although we do ask for an acknowledgement in
 product literature (see the README file in the distribution for details).
 We hope to make this software industrial-quality --- although, as with
 anything that's free, we offer no warranty and accept no liability.
 The Independent JPEG Group is a volunteer organization; if you'd like to
 contribute to improving our software, you are welcome to join.
 If you are not reasonably handy at configuring and installing portable C
 programs, you may have some difficulty installing the free source code.
 Steve Davis (strat@cis.ksu.edu) has volunteered to maintain an archive of
 pre-built executable versions of the free JPEG code for various machines.
 His FTP archive is at procyon.cis.ksu.edu (129.130.10.80); look under
 /pub/JPEG to see what he currently has.
 In addition to the free JPEG software, I am aware of two shareware
programs
 from Handmade Software (contact hsi@netcom.com for details).  Their
software
 runs on PCs and on a limited number of Unix machines.  As of today, their
 software is faster and does better color quantization than the free JPEG
 software; but they had better have their running shoes on if they don't
want
 to be surpassed soon.  (And of course, you're morally obligated to pay if
 you use their software.)
 There are numerous commercial JPEG offerings, with more popping up every
 day.  I recommend that you not waste your money unless you find the free
 software vastly too slow.  In that case, purchase a hardware-assisted
 product.  Ask hard questions about whether the product complies with the
 final JPEG standard and about whether it can handle the JFIF file format;
 an awful lot of the earliest commercial releases are not and never will
be
 compatible with anyone else's files.
 --
                         tom lane
                         organizer, Independent JPEG Group
 Internet: tgl@cs.cmu.edu        BITNET: tgl%cs.cmu.edu@cmuccvma
 ------------
