Minggu, 18 Maret 2012

JPEG STRUCTUR FILE


In computingJPEG (play  [pronounced as jay-peg] is a commonly used method of lossy compressionfor digital photography (image). The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and image quality. JPEG typically achieves 10:1 compression with little perceptible loss in image quality.
JPEG compression is used in a number of image file formats. JPEG/Exif is the most common image format used by digital cameras and other photographic image capture devices; along with JPEG/JFIF, it is the most common format for storing and transmitting photographic images on the World Wide Web. These format variations are often not distinguished, and are simply called JPEG.
The term "JPEG" is an acronym for the Joint Photographic Experts Group which created the standard. The MIME media type for JPEG is image/jpeg(defined in RFC 1341), except in Internet Explorer, which provides a MIME type of image/pjpeg when uploading JPEG images.
The JPEG standard specifies the codec, which defines how an image is compressed into a stream ofbytes and decompressed back into an image, but not the file format used to contain that stream.The Exif and JFIF standards define the commonly used file formats for interchange of JPEG-compressed images.
JPEG standards are formally named as Information technology – Digital compression and coding of continuous-tone still images. ISO/IEC 10918 consists of the following parts:
JPEG standard – Parts
PartISO/IEC standardITU-T Rec.First public release dateLatest amendmentTitleDescription
Part 1ISO/IEC 10918-1:1994T.81 (09/92)1992Requirements and guidelines
Part 2ISO/IEC 10918-2:1995T.83 (11/94)1994Compliance testingrules and checks for software conformance (to Part 1)
Part 3ISO/IEC 10918-3:1997T.84 (07/96)19961999Extensionsset of extensions to improve the Part 1, including the SPIFF file format
Part 4ISO/IEC 10918-4:1999T.86 (06/98)1998Registration of JPEG profiles, SPIFF profiles, SPIFF tags, SPIFF colour spaces, APPn markers, SPIFF compression types and Registration Authorities (REGAUT)methods for registering some of the parameters used to extend JPEG
Part 5ISO/IEC FDIS 10918-5T.871 (05/11)(under development since 2009)JPEG File Interchange Format (JFIF)A popular format which has been the de-facto file format for images encoded by the JPEG standard. In 2009, the JPEG Committee formally established an Ad Hoc Group to standardize JFIF as JPEG Part 5.

>*  JPEG compression
The compression method is usually lossy, meaning that some original image information is lost and cannot be restored, possibly affecting image quality. There is an optional lossless mode defined in the JPEG standard; however, this mode is not widely supported in products.
There is also an interlaced "Progressive JPEG" format, in which data are compressed in multiple passes of progressively higher detail. This is ideal for large images that will be displayed while downloading over a slow connection, allowing a reasonable preview after receiving only a portion of the data. However, progressive JPEGs are not as widely supported, and even some software which does support them (such as some versions of Internet Explorer) only displays the image once it has been completely downloaded.
There are also many medical imaging and traffic systems that create and process 12-bit JPEG images, normally grayscale images. The 12-bit JPEG format has been part of the JPEG specification for some time, but again, this format is not as widely supported.
>* JPEG files

The file format known as "JPEG Interchange Format" (JIF) is specified in Annex B of the standard. However, this "pure" file format is rarely used, primarily because of the difficulty of programming encoders and decoders that fully implement all aspects of the standard and because of certain shortcomings of the standard:
  • Color space definition
  • Component sub-sampling registration
  • Pixel aspect ratio definition


>*  Syntax and structure
A JPEG image consists of a sequence of segments, each beginning with a marker, each of which begins with a 0xFF byte followed by a byte indicating what kind of marker it is. Some markers consist of just those two bytes; others are followed by two bytes indicating the length of marker-specific payload data that follows. (The length includes the two bytes for the length, but not the two bytes for the marker.) Some markers are followed by entropy-coded data; the length of such a marker does not include the entropy-coded data. Note that consecutive 0xFF bytes are used as fill bytes for padding purposes (see JPEG specification section B.1.1.2 for details).
Within the entropy-coded data, after any 0xFF byte, a 0x00 byte is inserted by the encoder before the next byte, so that there does not appear to be a marker where none is intended, preventing framing errors. Decoders must skip this 0x00 byte. This technique, called byte stuffing (see JPEG specification section F.1.2.3), is only applied to the entropy-coded data, not to marker payload data.
Common JPEG markers
Short nameBytesPayloadNameComments
SOI0xFF, 0xD8noneStart Of Image
SOF00xFF, 0xC0variable sizeStart Of Frame (BaselineDCT)Indicates that this is a baseline DCT-based JPEG, and specifies the width, height, number of components, and component subsampling (e.g., 4:2:0).
SOF20xFF, 0xC2variable sizeStart Of Frame (Progressive DCT)Indicates that this is a progressive DCT-based JPEG, and specifies the width, height, number of components, and component subsampling (e.g., 4:2:0).
DHT0xFF, 0xC4variable sizeDefine Huffman Table(s)Specifies one or more Huffman tables.
DQT0xFF, 0xDBvariable sizeDefine Quantization Table(s)Specifies one or more quantization tables.
DRI0xFF, 0xDD2 bytesDefine Restart IntervalSpecifies the interval between RSTn markers, in macroblocks. This marker is followed by two bytes indicating the fixed size so it can be treated like any other variable size segment.
SOS0xFF, 0xDAvariable sizeStart Of ScanBegins a top-to-bottom scan of the image. In baseline DCT JPEG images, there is generally a single scan. Progressive DCT JPEG images usually contain multiple scans. This marker specifies which slice of data it will contain, and is immediately followed by entropy-coded data.
RSTn0xFF, 0xD0 … 0xD7noneRestartInserted every r macroblocks, where r is the restart interval set by a DRI marker. Not used if there was no DRI marker. The low 3 bits of the marker code cycle in value from 0 to 7.
APPn0xFF, 0xEnvariable sizeApplication-specificFor example, an Exif JPEG file uses an APP1 marker to store metadata, laid out in a structure based closely on TIFF.
COM0xFF, 0xFEvariable sizeCommentContains a text comment.
EOI0xFF, 0xD9noneEnd Of Image
There are other Start Of Frame markers that introduce other kinds of JPEG encodings.
Since several vendors might use the same APPn marker type, application-specific markers often begin with a standard or vendor name (e.g., "Exif" or "Adobe") or some other identifying string.
At a restart marker, block-to-block predictor variables are reset, and the bitstream is synchronized to a byte boundary. Restart markers provide means for recovery after bitstream error, such as transmission over an unreliable network or file corruption. Since the runs of macroblocks between restart markers may be independently decoded, these runs may be decoded in parallel.

>*  JPEG codec example
       * Compression ratio and artifacts
       
The resulting compression ratio can be varied according to need by being more or less aggressive in the divisors used in the quantization phase. Ten to one compression usually results in an image that cannot be distinguished by eye from the original. 100 to one compression is usually possible, but will look distinctly artifacted compared to the original. The appropriate level of compression depends on the use to which the image will be put.
irregularities known as compression artifacts that appear in JPEG images, which may take the form of noise around contrasting edges (especially curves and corners), or blocky images, commonly known as 'jaggies'. These are due to the quantization step of the JPEG algorithm. They are especially noticeable around sharp corners between contrasting colors (text is a good example as it contains many such corners). The analogous artifacts in MPEGvideo are referred to as mosquito noise, as the resulting "edge busyness" and spurious dots, which change over time, resemble mosquitoes swarming around the object.
These artifacts can be reduced by choosing a lower level ofcompression; they may be eliminated by saving an image using a lossless file format, though for photographic images this will usually result in a larger file size. The images created with ray-tracing programs have noticeable blocky shapes on the terrain. Certain low-intensity compression artifacts might be acceptable when simply viewing the images, but can be emphasized if the image is subsequently processed, usually resulting in unacceptable quality. Consider the example below, demonstrating the effect of lossy compression on an edge detection processing step.



ImageLossless compressionLossy compression
OriginalLossless-circle.pngLossy-circle.jpg
Processed by
Canny edge detector
Lossless-circle-canny.pngLossy-circle-canny.png
Some programs allow the user to vary the amount by which individual blocks are compressed. Stronger compression is applied to areas of the image that show fewer artifacts. This way it is possible to manually reduce JPEG file size with less loss of quality.
JPEG artifacts, like pixelation, are occasionally intentionally exploited for artistic purposes, as inJpegs, by German photographer Thomas Ruff.
Since the quantization stage always results in a loss of information, JPEG standard is always a lossy compression codec. (Information is lost both in quantizing and rounding of the floating-point numbers.) Even if the quantization matrix is a matrix of ones, information will still be lost in the rounding step.







Sabtu, 17 Maret 2012

MAGIC NUMBER



Magic numbers are common in programs across many operating systems. Magic numbers implement strongly typed data and are a form of in-band signaling to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many file formats and can yield further run-time informations.


Some examples:

  • Compiled Java class files (bytecode) start with hex CAFEBABE. When compressed with Pack200the bytes are changed to CAFED00D.
  • GIF image files have the ASCII code for "GIF89a" (47 49 46 38 39 61) or "GIF87a" (47 49 46 38 37 61)
  • JPEG image files begin with FF D8 and end with FF D9. JPEG/JFIF files contain the ASCII code for "JFIF" (4A 46 49 46) as a null terminated string. JPEG/Exif files contain the ASCII code for "Exif" (45 78 69 66) also as a null terminated string, followed by more metadata about the file.
  • PNG image files begin with an 8-byte signature which identifies the file as a PNG file and allows detection of common file transfer problems: \211 P N G \r \n \032 \n(89 50 4E 47 0D 0A 1A 0A). That signature contains various newline characters to permit detecting unwarranted automated newline conversions, such as transferring the file using FTP with the ASCII transfer mode instead of the binary mode.
  • Standard MIDI music files have the ASCII code for "MThd" (4D 54 68 64) followed by more metadata.
  • Unix script files usually start with a shebang, "#!" (23 21) followed by the path to an interpreter.
  • PostScript files and programs start with "%!" (25 21).
  • PDF files start with "%PDF" (hex 25 50 44 46).
  • MS-DOS EXE files and the EXE stub of the Microsoft Windows PE (Portable Executable) files start with the characters "MZ" (4D 5A), the initials of the designer of the file format, Mark Zbikowski. The definition allows "ZM" (5A 4D) as well, but this is quite uncommon.
  • The Berkeley Fast File System superblock format is identified as either 19 54 01 19 or 01 19 54depending on version; both represent the birthday of the author, Marshall Kirk McKusick.
  • The Master Boot Record of bootable storage devices on almost all IA-32 IBM PC compatibles has a code of AA 55 as its last two bytes.
  • Executables for the Game Boy and Game Boy Advance handheld video game systems have a 48-byte or 156-byte magic number, respectively, at a fixed spot in the header. This magic number encodes a bitmap of the Nintendo logo.
  • Amiga software executable Hunk files running on Amiga classic 68000 machines all started with the hexadecimal number $000003f3, nicknamed the "Magic Cookie."
  • Amiga's black screen of death called Guru Meditation, in its first version, when the machine hung up for uncertain reasons, showed the hexadecimal number 48454C50, which stands for "HELP" in hexadecimal ASCII characters (48=H, 45=E, 4C=L, 50=P).
  • In the Amiga, the only absolute address in the system is hex $0000 0004 (memory location 4), which contains the start location called SysBase, a pointer to exec.library, the so-called kernel of Amiga.
  • PEF files, used by Mac OS and BeOS for PowerPC executables, contain the ASCII code for "Joy!" (4A 6F 79 21) as a prefix.
  • TIFF files begin with either II or MM followed by 42 as a two-byte integer in little or big endian byte ordering. II is for Intel, which uses little endian byte ordering, so the magic number is49 49 2A 00MM is for Motorola, which uses big endian byte ordering, so the magic number is4D 4D 00 2A.
  • Unicode text files encoded in UTF-16 often start with the Byte Order Mark to detect endianness(FE FF for big endian and FF FE for little endian). UTF-8 text files often start with the UTF-8 encoding of the same character, EF BB BF.
  • LLVM Bitcode files start with BC (0x42, 0x43)
  • WAD files start with IWAD or PWAD (for doom), WAD2 (for quekc) and WAD3 (for Half-life).
  • Microsoft Office document files start with D0 CF 11 E0, which is visually suggestive of the word "DOCFILE0".
  • Headers in ZIP files begin with "PK" (50 4B), the initials of Phil Katz, author of DOS compression utility PKZIP.