
and functionality improvements (including support for measuring coverage), primitive argument-checking fixes, and object-file changes resulting in reduced load times (and some backward incompatibility): - annotations are now preserved in object files for debug only, for profiling only, for both, or not at all, depending on the settings of generate-inspector-information and compile-profile. in particular, when inspector information is not enabled but profiling is, source information does not leak into error messages and inspector output, though it is still available via the profile tools. The mechanics of this involved repurposing the fasl a? parameter to hold an annotation flags value when it is not #f and remaking annotations with new flags if necessary before emitting them. compile.ss, fasl.ss, misc.ms - altered a number of mats to produce correct results even when the 's' directory is profiled. misc.ms, cp0.ms, record.ms - profile-release-counters is now generation-friendly; that is, it doesn't look for dropped code objects in generations that have not been collected since the last call to profile-release-counters. also, it no longer allocates memory when it releases counters. pdhtml.ss, gc.c, gcwrapper.c, globals.h, prim5.c - removed unused entry points S_ifile, S_ofile, and S_iofile alloc.c, externs.h - mats that test loading profile info into the compiler's database to guide optimization now weed out preexisting entries, in case the 's' directory is profiled. 4.ms, mat.ss, misc.ms, primvars.ms - counters for dropped code objects are now released at the start of each mat group. mat.ss - replaced ehc (enable-heap-check) option with hci (heap-check-interval) option that allows heap checks to be performed periodically rather than on each collection. hci=0 is equivalent to ehc=f (disabling heap checks) and hci=1 is equivalent to ehc=t (enabling heap checks every collection), while hci=100 enables heap checks only every 100th collection. allx and bullyx mats use this feature to reduce heap-checking overhead to a more reasonable level. this is particularly important when the 's' directory is profiled, since the amount of static memory to be checked is greatly increased due to the counters. mats/Mf-base, mat.ss, primvars.ms - added a mat that calls #%show-allocation, which was otherwise not being tested. misc.ms - removed a broken primvars mat and updated two others. in each case, the mat was looking for information about primitives in the wrong (i.e., old) place and silently succeeding when it didn't find any primitives to tests. the revised mats (along with a few others) now check to make sure at least one identifier has the information they look for. the removed mat was checking for library information that is now compiled in, so the mat is now unnecessary. the others were (not) doing argument-error checks. fixing these turned up a handful of problems that have also been fixed: a couple of unbound variables in the mat driver, two broken primdata declarations, a tardy argument check by profile-load-data, and a bug in char-ready?, which was requiring an argument rather than defaulting it to the current input port. primdata.ss, pdhtml.ss, io.ms, primdvars.ms, 4.ms, 6.ms, misc.ms, patch* - added initial support for recording coverage information. when the new parameter generate-covin-files is set, the compiler generates .covin files containing the universe of all source objects for which profile forms are present in the expander output. when profiling and generation of covin files are enabled in the 's' directory, the mats optionally generate .covout files for each mat file giving the subset of the universe covered by the mat file, along with an all.covout in each mat output directory aggregating the coverage for the directory and another all.covout in the top-level mat directory aggregating the coverage for all directories. back.ss, compile.ss, cprep.ss, primdata.ss, s/Mf-base, mat.ss, mats/Mf-base, mats/primvars.ms - support for generating covout files is now built in. with-coverage-output gathers and dumps coverage information, and aggregate-coverage-output combines (aggregates) covout files. pdhtml.ss, primdata.ss, compile.ss, mat.ss, mats/Mf-base, primvars.ms - profile-clear now adjusts active coverage trackers to avoid losing coverage information. pdhtml.ss, prim5.c - nested with-coverage calls are now supported. pdhtml.ss - switched to a more compact representation for covin and covout files; reduces disk space (compressed or not) by about a factor of four and read time by about a factor of two with no increase in write time. primdata.ss, pdhtml.ss, cprep.ss, compile.ss, mat.ss, mats/Mf-base - added support for determining coverage for an entire run, including coverage for expressions hit during boot time. 'all' mats now produce run.covout files in each output directory, and 'allx' mats produce an aggregate run.covout file in the mat directory. pdhtml.ss, mat.ss, mats/Mf-base - profile-release-counters now adjusts active coverage trackers to account for the counters that have been released. pdhtml.ss, prim5.c - replaced the artificial "examples" target with a real "build-examples" target so make won't think it always has to mats that depend upon the examples directory having been compiled. mats make clean now runs make clean in the examples directory. mats/Mf-base importing a library from an object file now just visits the object file rather than doing a full load so that the run-time code for the library is not retained. The run-time code is still read because the current fasl format forces the entire file to be read, but not retaining the code can lower heap size and garbage-collection cost, particularly when many object-code libraries are imported. The downside is that the file must be revisited if the run-time code turns out to be required. This change exposed several places where the code was failing to check if a revisit is needed. syntax.ss, 7.ms, 8.ms, misc.ms, root-experr* - fixed typos: was passing unquoted load rather than quoted load to $load-library along one path (where it is loading source code and therefore irrelevant), and was reporting src-path rather than obj-path in a message about failing to define a library. syntax.ss - compile-file and friends now put all recompile information in the first fasl object after the header so the library manager can find it without loading the entire fasl file. The library manager now does so. It also now checks to see if library object files need to be recreated before loading them rather than loading them and possibly recompiling them after discovering they are out of date, since the latter requires loading the full object file even if it's out of date, while the former takes advantage of the ability to extract just recompile information. as well as reducing overhead, this eliminates possibly undesirable side effects, such as creation and registration of out-of-date nongenerative record-type descriptors. because the library manager expects to find recompile information at the front of an object file, it will not find all recompile information if object files are "catted" together. also, compile-file has to hold in memory the object code for all expressions in the file so that it can emit the unified recompile information, rather than writing to the object file incrementally, which can significantly increase the memory required to compile a large file full of individual top-level forms. This does not affect top-level programs, which were already handled as a whole, or a typical library file that contains just a single library form. compile.ss, syntax.ss - the library manager now checks include files before library dependencies when compile-imported-libraries is false (as it already did when compile-imported-libraries is true) in case a source change affects the set of imported libraries. (A library change can affect the set of include files as well, but checking dependencies before include files can cause unneeded libraries to be loaded.) The include-file check is based on recompile-info rather than dependencies, but the library checks are still based on dependencies. syntax.ss - fixed check for binding of scheme-version. (the check prevents premature treatment of recompile-info records as Lexpand forms to be passed to $interpret-backend.) scheme.c - strip-fasl-file now preserves recompile-info when compile-time info is stripped. strip.ss - removed include-req* from library/ct-info and ctdesc records; it is no longer needed now that all recompile information is maintained separately. expand-lang.ss, syntax.ss, compile.ss, cprep.ss, syntax.ss - changed the fasl format and reworked a lot of code in the expander, compiler, fasl writer, and fasl reader to allow the fasl reader to skip past run-time information when it isn't needed and compile-time information when it isn't needed. Skipping past still involves reading and decoding when encrypted, but the fasl reader no longer parses or allocates code and data in the portions to be skipped. Side effects of associating record uids with rtds are also avoided, as are the side effects of interning symbols present only in the skipped data. Skipping past code objects also reduces or eliminates the need to synchronize data and instruction caches. Since the fasl reader no longer returns compile-time (visit) or run-time (revisit) code and data when not needed, the fasl reader no longer wraps these objects in a pair with a 0 or 1 visit or revisit marker. To support this change, the fasl writer generates separate top-level fasl entries (and graphs) for separate forms in the same top-level source form (e.g., begin or library). This reliably breaks eq-ness of shared structure across these forms, which was previously broken only when visit or revisit code was loaded at different times (this is an incompatible change). Because of the change, fasl "groups" are no longer needed, so they are no longer handled. 7.ss, cmacros.ss, compile.ss, expand-lang.ss, strip.ss, externs.h, fasl.c, scheme.c, hash.ms - the change above is surfaced in an optional fasl-read "situation" argument (visit, revisit, or load). The default is load. visit causes it to skip past revisit code and data; revisit causes it to skip past visit code and data; and load causes it not to skip past either. visit-revisit data produced by (eval-when (visit revisit) ---) is never skipped. 7.ss, primdata.ss, io.stex - to improve compile-time and run-time error checking, the Lexpand recompile-info, library/rt-info, library-ct-info, and program-info forms have been replaced with list-structured forms, e.g., (recompile-info ,rcinfo). expand-lang.ss, compile.ss, cprep.ss, interpret.ss, syntax.ss - added visit-compiled-from-port and revisit-compiled-from-port to complement the existing load-compiled-from-port. 7.ss, primdata.ss, 7.ms, system.stex - increased amount read when seeking an lz4-encrypted input file from 32 to 1024 bytes at a time compress-io.c - replaced the fasl a? parameter value #t with an "all" flag value so it's value is consistently a mask. cmacros.ss, fasl.ss, compile.ss - split off profile mats into a separate file misc.ms, profile.ms (new), root-experr*, mats/Mf-base - added coverage percent computations to mat allx/bullyx output mat.ss, mats/Mf-base, primvars.ms - replaced coverage tables with more generic and generally useful source tables, which map source objects to arbitrary values. pdhtml.ss, compile.ss, cprep.ss, primdata.ss, mat.ss, mats/Mf-base, primvars.ms, profile.ms, syntax.stex - reduced profile counting overhead by using calls to fold-left instead of calls to apply and map and by using fixnum operations for profile counts on 64-bit machines. pdhtml.ss - used a critical section to fix a race condition in the calculations of profile counts that sometimes resulted in bogus (including negative) counts, especially when the 's' directory is profiled. pdhtml.ss - added discard flag to declaration for hashtable-size primdata.ss - redesigned the printed representation of source tables and rewrote get-source-table! to read and store incrementally to reduce memory overhead. compile.ss - added generate-covin-files to the set of parameters preserved by compile-file, etc. compile.ss, system.stex - moved covop argument before the undocumented machine and hostop arguments to compile-port and compile-to-port. removed the undocumented ofn argument from compile-to-port; using (port-name ip) instead. compile.ss, primdata.ss, 7.ms, system.stex - compile-port now tries to come up with a file position to supply to make-read, which it can do if the port's positions are character positions (presently string ports) or if the port is positioned at zero. compile.ss - audited the argument-type-error fuzz mat exceptions and fixed a host of problems this turned up (entries follow). added #f as an invalid argument for every type for which #f is indeed invalid to catch places where the maybe- prefix was missing on the argument type. the mat tries hard to determine if the condition raised (if any) as the result of an invalid argument is appropriate and redirects the remainder to the mat-output (.mo) file prefixed with 'Expected error', causing them to show up in the expected error output so developers will be encouraged to audit them in the future. primvars.ms, mat.ss - added an initial symbol? test on machine type names so we produce an invalid machine type error message rather than something confusing like "machine type #f is not supported". compile.ss - fixed declarations for many primitives that were specified as accepting arguments of more general types than they actually accept, such as number -> real for various numeric operations, symbol -> endianness for various bytevector operations, time -> time-utc for time-utc->date, and list -> list-of-string-pairs for default-library-search-handler. also replaced some of the sub-xxxx types with specific types such as sub-symbol -> endianness in utf16->string, but only where they were causing issues with the primvars argument-type-error fuzz mat. (this should be done more generally.) primdata.ss - fixed incorrect who arguments (was map instead of fold-right, current-date instead of time-utc->date); switched to using define-who/set-who! generally. 4.ss, date.ss - append! now checks all arguments before any mutation 5_2.ss - with-source-path now properly supplies itself as who for the string? argument check; callers like load now do their own checks. 7.ss - added missing integer? check to $fold-bytevector-native-ref whose lack could have resulted in a compile-time error. cp0.ss - fixed typo in output-port-buffer-mode error message io.ss - fixed who argument (was fx< rather than fx<?) library.ss - fixed declaration of first source-file-descriptor argument (was sfd, now string) primdata.ss - added missing article 'a' in a few error messages prims.ss - fixed the copy-environment argument-type error message for the list of symbols argument. syntax.ss - the environment procedure now catches exceptions that occur and reraises the exception with itself as who if the condition isn't already a who condition. syntax.ss - updated experr and allx patch files for changes to argument-count fuzz mat and fixes for problems turned up by them. root-experr*, patch* - fixed a couple of issues setting port sizes: string and bytevector output port put handlers don't need room to store the character or byte, so they now set the size to the buffer length rather than one less. binary-file-port-clear-output now sets the index rather than size to zero; setting the size to zero is inappropriate for some types of ports and could result in loss of buffering and even suppression of future output. removed a couple of redundant sets of the size that occur immediately after setting the buffer. io.ss - it is now possible to return from a call to with-profile-tracker multiple times and not double-count (or worse) any counts. pdhtml.ss, profile.ms - read-token now requires a file position when it is handed a source-file descriptor (since the source-file descriptor isn't otherwise useful), and the source-file descriptor argument can no longer be #f. the input file position plays the same role as the input file position in get-datum/annotations. these extra read-token arguments are now documented. read.ss, 6.ms, io.stex - the source-file descriptor argument to get-datum/annotations can no longer be #f. it was already documented that way. read.ss - read-token and do-read now look for the character-positions port flag before asking if the port has port-position, since the latter is slightly more expensive. read.ss - rd-error now reports the current port position if it can be determined when fp isn't already set, i.e., when reading from a port without character positions (presently any non string port) and fp has not been passed in explicitly (to read-token or get-datum/annotations). the port position might not be a character position, but it should be better than nothing. read.ss - added comment noting an invariant for s_profile_release_counters. prim5.c - restored accidentally dropped fasl-write formdef and dropped duplicate fasl-read formdef io.stex - added a 'coverage' target that tests the coverage of the Scheme-code portions of Chez Scheme by the mats. Makefile.in, Makefile-workarea.in - added .PHONY declarations for all of the targets in the top-level and workarea make files, and renamed the create-bintar, create-rpm, and create-pkg targets bintar, rpm, and pkg. Makefile.in, Makefile-workarea.in - added missing --retain-static-relocation command-line argument and updated the date scheme.1.in - removed a few redundant conditional variable settings configure - fixed declaration of condition wait (timeout -> maybe-timeout) primdata.ss original commit: 88501743001393fa82e89c90da9185fc0086fbcb
673 lines
18 KiB
C
673 lines
18 KiB
C
/* compress-io.c
|
|
* Copyright 1984-2019 Cisco Systems, Inc.
|
|
*
|
|
* Licensed under the Apache License, Version 2.0 (the "License");
|
|
* you may not use this file except in compliance with the License.
|
|
* You may obtain a copy of the License at
|
|
*
|
|
* http://www.apache.org/licenses/LICENSE-2.0
|
|
*
|
|
* Unless required by applicable law or agreed to in writing, software
|
|
* distributed under the License is distributed on an "AS IS" BASIS,
|
|
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
|
* See the License for the specific language governing permissions and
|
|
* limitations under the License.
|
|
*/
|
|
|
|
/* Dispatch to zlib or LZ4 */
|
|
|
|
#include "system.h"
|
|
#include "zlib.h"
|
|
#include "lz4.h"
|
|
#include "lz4frame.h"
|
|
#include "lz4hc.h"
|
|
#include <fcntl.h>
|
|
#include <errno.h>
|
|
|
|
#ifdef WIN32
|
|
#include <io.h>
|
|
# define WIN32_IZE(id) _ ## id
|
|
# define GLZ_O_BINARY O_BINARY
|
|
#else
|
|
# define WIN32_IZE(id) id
|
|
# define GLZ_O_BINARY 0
|
|
#endif
|
|
|
|
/* the value of LZ4_OUTPUT_PORT_IN_BUFFER_SIZE was determined
|
|
through experimentation on an intel linux server and an intel
|
|
osx laptop. smaller sizes result in significantly worse compression
|
|
of object files, and larger sizes don't have much beneficial effect.
|
|
don't increase the output-port in-buffer size unless you're sure
|
|
it reduces object-file size or reduces compression time
|
|
significantly. don't decrease it unless you're sure it doesn't
|
|
increase object-file size significnatly. one buffer of size
|
|
LZ4_OUTPUT_PORT_IN_BUFFER_SIZE is allocated per lz4-compressed
|
|
output port. another buffer of a closely related size is allocated
|
|
per thread. */
|
|
#define LZ4_OUTPUT_PORT_IN_BUFFER_SIZE (1 << 18)
|
|
|
|
/* the values we choose for LZ4_INPUT_PORT_IN_BUFFER_SIZE and
|
|
LZ4_INPUT_PORT_OUT_BUFFER_SIZE don't seem to make much difference
|
|
in decompression speed, so we keep them fairly small. one buffer
|
|
of size LZ4_INPUT_PORT_IN_BUFFER_SIZE and one buffer of size
|
|
LZ4_INPUT_PORT_OUT_BUFFER_SIZE are allocated per lz4-compressed
|
|
input port. */
|
|
#define LZ4_INPUT_PORT_IN_BUFFER_SIZE (1 << 12)
|
|
#define LZ4_INPUT_PORT_OUT_BUFFER_SIZE (1 << 14)
|
|
|
|
typedef struct lz4File_out_r {
|
|
LZ4F_preferences_t preferences;
|
|
INT fd;
|
|
INT out_buffer_size;
|
|
INT in_pos;
|
|
INT err;
|
|
size_t stream_pos;
|
|
char in_buffer[LZ4_OUTPUT_PORT_IN_BUFFER_SIZE];
|
|
} lz4File_out;
|
|
|
|
typedef struct lz4File_in_r {
|
|
INT fd;
|
|
LZ4F_dctx *dctx;
|
|
INT in_pos, in_len, out_pos, out_len;
|
|
INT frame_ended;
|
|
INT err;
|
|
size_t stream_pos;
|
|
off_t init_pos;
|
|
char in_buffer[LZ4_INPUT_PORT_IN_BUFFER_SIZE];
|
|
char out_buffer[LZ4_INPUT_PORT_OUT_BUFFER_SIZE];
|
|
} lz4File_in;
|
|
|
|
typedef struct sized_buffer_r {
|
|
INT size;
|
|
char buffer[0];
|
|
} sized_buffer;
|
|
|
|
/* local functions */
|
|
static glzFile glzdopen_output_gz(INT fd, INT compress_level);
|
|
static glzFile glzdopen_output_lz4(INT fd, INT compress_level);
|
|
static glzFile glzdopen_input_gz(INT fd);
|
|
static glzFile glzdopen_input_lz4(INT fd, off_t init_pos);
|
|
static INT glzread_lz4(lz4File_in *lz4, void *buffer, UINT count);
|
|
static INT glzemit_lz4(lz4File_out *lz4, void *buffer, UINT count);
|
|
static INT glzwrite_lz4(lz4File_out *lz4, void *buffer, UINT count);
|
|
|
|
static glzFile glzdopen_output_gz(INT fd, INT compress_level) {
|
|
gzFile gz;
|
|
glzFile glz;
|
|
INT as_append;
|
|
INT level;
|
|
|
|
#ifdef WIN32
|
|
as_append = 0;
|
|
#else
|
|
as_append = fcntl(fd, F_GETFL) & O_APPEND;
|
|
#endif
|
|
|
|
if ((gz = gzdopen(fd, as_append ? "ab" : "wb")) == Z_NULL) return Z_NULL;
|
|
|
|
switch (compress_level) {
|
|
case COMPRESS_LOW:
|
|
level = Z_BEST_SPEED;
|
|
break;
|
|
case COMPRESS_MEDIUM:
|
|
level = (Z_BEST_SPEED + Z_BEST_COMPRESSION) / 2;
|
|
break;
|
|
case COMPRESS_HIGH:
|
|
level = (Z_BEST_SPEED + (3 * Z_BEST_COMPRESSION)) / 4;
|
|
break;
|
|
case COMPRESS_MAX:
|
|
level = Z_BEST_COMPRESSION;
|
|
break;
|
|
default:
|
|
S_error1("glzdopen_output_gz", "unexpected compress level ~s", Sinteger(compress_level));
|
|
level = 0;
|
|
break;
|
|
}
|
|
|
|
gzsetparams(gz, level, Z_DEFAULT_STRATEGY);
|
|
|
|
if ((glz = malloc(sizeof(struct glzFile_r))) == NULL) {
|
|
(void)gzclose(gz);
|
|
return Z_NULL;
|
|
}
|
|
glz->fd = fd;
|
|
glz->inputp = 0;
|
|
glz->format = COMPRESS_GZIP;
|
|
glz->gz = gz;
|
|
return glz;
|
|
}
|
|
|
|
static glzFile glzdopen_output_lz4(INT fd, INT compress_level) {
|
|
glzFile glz;
|
|
lz4File_out *lz4;
|
|
INT level;
|
|
|
|
switch (compress_level) {
|
|
case COMPRESS_LOW:
|
|
level = 1;
|
|
break;
|
|
case COMPRESS_MEDIUM:
|
|
level = LZ4HC_CLEVEL_MIN;
|
|
break;
|
|
case COMPRESS_HIGH:
|
|
level = (LZ4HC_CLEVEL_MIN + LZ4HC_CLEVEL_MAX) / 2;
|
|
break;
|
|
case COMPRESS_MAX:
|
|
level = LZ4HC_CLEVEL_MAX;
|
|
break;
|
|
default:
|
|
S_error1("glzdopen_output_lz4", "unexpected compress level ~s", Sinteger(compress_level));
|
|
level = 0;
|
|
break;
|
|
}
|
|
|
|
if ((lz4 = malloc(sizeof(lz4File_out))) == NULL) return Z_NULL;
|
|
memset(&lz4->preferences, 0, sizeof(LZ4F_preferences_t));
|
|
lz4->preferences.compressionLevel = level;
|
|
lz4->fd = fd;
|
|
lz4->out_buffer_size = (INT)LZ4F_compressFrameBound(LZ4_OUTPUT_PORT_IN_BUFFER_SIZE, &lz4->preferences);
|
|
lz4->in_pos = 0;
|
|
lz4->err = 0;
|
|
lz4->stream_pos = 0;
|
|
|
|
if ((glz = malloc(sizeof(struct glzFile_r))) == NULL) {
|
|
free(lz4);
|
|
return Z_NULL;
|
|
}
|
|
glz->fd = fd;
|
|
glz->inputp = 0;
|
|
glz->format = COMPRESS_LZ4;
|
|
glz->lz4_out = lz4;
|
|
return glz;
|
|
}
|
|
|
|
glzFile S_glzdopen_output(INT fd, INT compress_format, INT compress_level) {
|
|
switch (compress_format) {
|
|
case COMPRESS_GZIP:
|
|
return glzdopen_output_gz(fd, compress_level);
|
|
case COMPRESS_LZ4:
|
|
return glzdopen_output_lz4(fd, compress_level);
|
|
default:
|
|
S_error1("glzdopen_output", "unexpected compress format ~s", Sinteger(compress_format));
|
|
return Z_NULL;
|
|
}
|
|
}
|
|
|
|
static glzFile glzdopen_input_gz(INT fd) {
|
|
gzFile gz;
|
|
glzFile glz;
|
|
|
|
if ((gz = gzdopen(fd, "rb")) == Z_NULL) return Z_NULL;
|
|
|
|
if ((glz = malloc(sizeof(struct glzFile_r))) == NULL) {
|
|
(void)gzclose(gz);
|
|
return Z_NULL;
|
|
}
|
|
glz->fd = fd;
|
|
glz->inputp = 1;
|
|
glz->format = COMPRESS_GZIP;
|
|
glz->gz = gz;
|
|
return glz;
|
|
}
|
|
|
|
static glzFile glzdopen_input_lz4(INT fd, off_t init_pos) {
|
|
glzFile glz;
|
|
LZ4F_dctx *dctx;
|
|
LZ4F_errorCode_t r;
|
|
lz4File_in *lz4;
|
|
|
|
r = LZ4F_createDecompressionContext(&dctx, LZ4F_VERSION);
|
|
if (LZ4F_isError(r))
|
|
return Z_NULL;
|
|
|
|
if ((lz4 = malloc(sizeof(lz4File_in))) == NULL) {
|
|
(void)LZ4F_freeDecompressionContext(dctx);
|
|
return Z_NULL;
|
|
}
|
|
lz4->fd = fd;
|
|
lz4->dctx = dctx;
|
|
lz4->in_pos = 0;
|
|
lz4->in_len = 0;
|
|
lz4->out_len = 0;
|
|
lz4->out_pos = 0;
|
|
lz4->frame_ended = 0;
|
|
lz4->err = 0;
|
|
lz4->stream_pos = 0;
|
|
lz4->init_pos = init_pos;
|
|
|
|
if ((glz = malloc(sizeof(struct glzFile_r))) == NULL) {
|
|
(void)LZ4F_freeDecompressionContext(lz4->dctx);
|
|
free(lz4);
|
|
return Z_NULL;
|
|
}
|
|
glz->fd = fd;
|
|
glz->inputp = 1;
|
|
glz->format = COMPRESS_LZ4;
|
|
glz->lz4_in = lz4;
|
|
return glz;
|
|
}
|
|
|
|
glzFile S_glzdopen_input(INT fd) {
|
|
INT r, pos = 0;
|
|
unsigned char buffer[4];
|
|
off_t init_pos;
|
|
|
|
/* check for LZ4 magic number, otherwise defer to gzdopen */
|
|
|
|
if ((init_pos = WIN32_IZE(lseek)(fd, 0, SEEK_CUR)) == -1) return Z_NULL;
|
|
|
|
while (pos < 4) {
|
|
r = WIN32_IZE(read)(fd, (char*)buffer + pos, 4 - pos);
|
|
if (r == 0)
|
|
break;
|
|
else if (r > 0)
|
|
pos += r;
|
|
#ifdef EINTR
|
|
else if (errno == EINTR)
|
|
continue;
|
|
#endif
|
|
else
|
|
break; /* error reading */
|
|
}
|
|
|
|
if (pos > 0) {
|
|
if (WIN32_IZE(lseek)(fd, init_pos, SEEK_SET) == -1) return Z_NULL;
|
|
}
|
|
|
|
if ((pos == 4)
|
|
&& (buffer[0] == 0x04)
|
|
&& (buffer[1] == 0x22)
|
|
&& (buffer[2] == 0x4d)
|
|
&& (buffer[3] == 0x18))
|
|
return glzdopen_input_lz4(fd, init_pos);
|
|
|
|
return glzdopen_input_gz(fd);
|
|
}
|
|
|
|
glzFile S_glzopen_input(const char *path) {
|
|
INT fd;
|
|
|
|
fd = WIN32_IZE(open)(path, O_RDONLY | GLZ_O_BINARY);
|
|
|
|
if (fd == -1)
|
|
return Z_NULL;
|
|
else
|
|
return S_glzdopen_input(fd);
|
|
}
|
|
|
|
#ifdef WIN32
|
|
glzFile S_glzopen_input_w(const wchar_t *path) {
|
|
INT fd;
|
|
|
|
fd = _wopen(path, O_RDONLY | GLZ_O_BINARY);
|
|
|
|
if (fd == -1)
|
|
return Z_NULL;
|
|
else
|
|
return S_glzdopen_input(fd);
|
|
}
|
|
#endif
|
|
|
|
IBOOL S_glzdirect(glzFile glz) {
|
|
if (glz->format == COMPRESS_GZIP)
|
|
return gzdirect(glz->gz);
|
|
else
|
|
return 0;
|
|
}
|
|
|
|
INT S_glzclose(glzFile glz) {
|
|
INT r = Z_OK, saved_errno = 0;
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
r = gzclose(glz->gz);
|
|
break;
|
|
case COMPRESS_LZ4:
|
|
if (glz->inputp) {
|
|
lz4File_in *lz4 = glz->lz4_in;
|
|
while (1) {
|
|
INT r = WIN32_IZE(close)(lz4->fd);
|
|
#ifdef EINTR
|
|
if (r < 0 && errno == EINTR) continue;
|
|
#endif
|
|
if (r == 0) { saved_errno = errno; }
|
|
break;
|
|
}
|
|
(void)LZ4F_freeDecompressionContext(lz4->dctx);
|
|
free(lz4);
|
|
} else {
|
|
lz4File_out *lz4 = glz->lz4_out;
|
|
if (lz4->in_pos != 0) {
|
|
r = glzemit_lz4(lz4, lz4->in_buffer, lz4->in_pos);
|
|
if (r >= 0) r = Z_OK; else { r = Z_ERRNO; saved_errno = errno; }
|
|
}
|
|
while (1) {
|
|
int r1 = WIN32_IZE(close)(lz4->fd);
|
|
#ifdef EINTR
|
|
if (r1 < 0 && errno == EINTR) continue;
|
|
#endif
|
|
if (r == Z_OK && r1 < 0) { r = Z_ERRNO; saved_errno = errno; }
|
|
break;
|
|
}
|
|
free(lz4);
|
|
}
|
|
break;
|
|
default:
|
|
S_error1("S_glzclose", "unexpected compress format ~s", Sinteger(glz->format));
|
|
}
|
|
free(glz);
|
|
if (saved_errno) errno = saved_errno;
|
|
return r;
|
|
}
|
|
|
|
static INT glzread_lz4(lz4File_in *lz4, void *buffer, UINT count) {
|
|
while (lz4->out_pos == lz4->out_len) {
|
|
INT in_avail;
|
|
|
|
in_avail = lz4->in_len - lz4->in_pos;
|
|
if (!in_avail) {
|
|
while (1) {
|
|
in_avail = WIN32_IZE(read)(lz4->fd, lz4->in_buffer, LZ4_INPUT_PORT_IN_BUFFER_SIZE);
|
|
if (in_avail >= 0) {
|
|
lz4->in_len = in_avail;
|
|
lz4->in_pos = 0;
|
|
break;
|
|
#ifdef EINTR
|
|
} else if (errno == EINTR) {
|
|
/* try again */
|
|
#endif
|
|
} else {
|
|
lz4->err = Z_ERRNO;
|
|
return -1;
|
|
}
|
|
}
|
|
}
|
|
|
|
if (in_avail > 0) {
|
|
size_t amt, out_len = LZ4_INPUT_PORT_OUT_BUFFER_SIZE, in_len = in_avail;
|
|
|
|
/* For a large enough result buffer, try to decompress directly
|
|
to that buffer: */
|
|
if (count >= (out_len >> 1)) {
|
|
size_t direct_out_len = count;
|
|
|
|
if (lz4->frame_ended && lz4->in_buffer[lz4->in_pos] == 0)
|
|
return 0; /* count 0 after frame as stream terminator */
|
|
|
|
amt = LZ4F_decompress(lz4->dctx,
|
|
buffer, &direct_out_len,
|
|
lz4->in_buffer + lz4->in_pos, &in_len,
|
|
NULL);
|
|
lz4->frame_ended = (amt == 0);
|
|
|
|
if (LZ4F_isError(amt)) {
|
|
lz4->err = Z_STREAM_ERROR;
|
|
return -1;
|
|
}
|
|
|
|
lz4->in_pos += (INT)in_len;
|
|
|
|
if (direct_out_len) {
|
|
lz4->stream_pos += direct_out_len;
|
|
return (INT)direct_out_len;
|
|
}
|
|
|
|
in_len = in_avail - in_len;
|
|
}
|
|
|
|
if (in_len > 0) {
|
|
if (lz4->frame_ended && lz4->in_buffer[lz4->in_pos] == 0)
|
|
return 0; /* count 0 after frame as stream terminator */
|
|
|
|
amt = LZ4F_decompress(lz4->dctx,
|
|
lz4->out_buffer, &out_len,
|
|
lz4->in_buffer + lz4->in_pos, &in_len,
|
|
NULL);
|
|
lz4->frame_ended = (amt == 0);
|
|
|
|
if (LZ4F_isError(amt)) {
|
|
lz4->err = Z_STREAM_ERROR;
|
|
return -1;
|
|
}
|
|
|
|
lz4->in_pos += (INT)in_len;
|
|
lz4->out_len = (INT)out_len;
|
|
lz4->out_pos = 0;
|
|
}
|
|
} else {
|
|
/* EOF on read */
|
|
break;
|
|
}
|
|
}
|
|
|
|
if (lz4->out_pos < lz4->out_len) {
|
|
UINT amt = lz4->out_len - lz4->out_pos;
|
|
if (amt > count) amt = count;
|
|
memcpy(buffer, lz4->out_buffer + lz4->out_pos, amt);
|
|
lz4->out_pos += amt;
|
|
lz4->stream_pos += amt;
|
|
return amt;
|
|
}
|
|
|
|
return 0;
|
|
}
|
|
|
|
INT S_glzread(glzFile glz, void *buffer, UINT count) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
return gzread(glz->gz, buffer, count);
|
|
case COMPRESS_LZ4:
|
|
return glzread_lz4(glz->lz4_in, buffer, count);
|
|
default:
|
|
S_error1("S_glzread", "unexpected compress format ~s", Sinteger(glz->format));
|
|
return -1;
|
|
}
|
|
}
|
|
|
|
static INT glzemit_lz4(lz4File_out *lz4, void *buffer, UINT count) {
|
|
ptr tc = get_thread_context();
|
|
sized_buffer *cached_out_buffer;
|
|
char *out_buffer;
|
|
INT out_len, out_pos;
|
|
INT r = 0;
|
|
|
|
/* allocate one out_buffer (per thread) since we don't need one for each file.
|
|
the buffer is freed by destroy_thread. */
|
|
if ((cached_out_buffer = LZ4OUTBUFFER(tc)) == NULL || cached_out_buffer->size < lz4->out_buffer_size) {
|
|
if (cached_out_buffer != NULL) free(cached_out_buffer);
|
|
if ((LZ4OUTBUFFER(tc) = cached_out_buffer = malloc(sizeof(sized_buffer) + lz4->out_buffer_size)) == NULL) return -1;
|
|
cached_out_buffer->size = lz4->out_buffer_size;
|
|
}
|
|
out_buffer = cached_out_buffer->buffer;
|
|
|
|
out_len = (INT)LZ4F_compressFrame(out_buffer, lz4->out_buffer_size,
|
|
buffer, count,
|
|
&lz4->preferences);
|
|
if (LZ4F_isError(out_len)) {
|
|
lz4->err = Z_STREAM_ERROR;
|
|
return -1;
|
|
}
|
|
|
|
out_pos = 0;
|
|
while (out_pos < out_len) {
|
|
r = WIN32_IZE(write)(lz4->fd, out_buffer + out_pos, out_len - out_pos);
|
|
if (r >= 0)
|
|
out_pos += r;
|
|
#ifdef EINTR
|
|
else if (errno == EINTR)
|
|
continue;
|
|
#endif
|
|
else
|
|
break;
|
|
}
|
|
|
|
return r;
|
|
}
|
|
|
|
static INT glzwrite_lz4(lz4File_out *lz4, void *buffer, UINT count) {
|
|
UINT amt; INT r;
|
|
|
|
if ((amt = LZ4_OUTPUT_PORT_IN_BUFFER_SIZE - lz4->in_pos) > count) amt = count;
|
|
|
|
if (amt == LZ4_OUTPUT_PORT_IN_BUFFER_SIZE) {
|
|
/* full buffer coming from input...skip the memcpy */
|
|
if ((r = glzemit_lz4(lz4, buffer, LZ4_OUTPUT_PORT_IN_BUFFER_SIZE)) < 0) return 0;
|
|
} else {
|
|
memcpy(lz4->in_buffer + lz4->in_pos, buffer, amt);
|
|
if ((lz4->in_pos += amt) == LZ4_OUTPUT_PORT_IN_BUFFER_SIZE) {
|
|
lz4->in_pos = 0;
|
|
if ((r = glzemit_lz4(lz4, lz4->in_buffer, LZ4_OUTPUT_PORT_IN_BUFFER_SIZE)) < 0) return 0;
|
|
}
|
|
}
|
|
|
|
lz4->stream_pos += amt;
|
|
return amt;
|
|
}
|
|
|
|
INT S_glzwrite(glzFile glz, void *buffer, UINT count) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
return gzwrite(glz->gz, buffer, count);
|
|
case COMPRESS_LZ4:
|
|
return glzwrite_lz4(glz->lz4_out, buffer, count);
|
|
default:
|
|
S_error1("S_glzwrite", "unexpected compress format ~s", Sinteger(glz->format));
|
|
return -1;
|
|
}
|
|
}
|
|
|
|
long S_glzseek(glzFile glz, long offset, INT whence) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
return gzseek(glz->gz, offset, whence);
|
|
case COMPRESS_LZ4:
|
|
if (glz->inputp) {
|
|
lz4File_in *lz4 = glz->lz4_in;
|
|
if (whence == SEEK_CUR)
|
|
offset += (long)lz4->stream_pos;
|
|
if (offset < 0)
|
|
offset = 0;
|
|
if ((size_t)offset < lz4->stream_pos) {
|
|
/* rewind and read from start */
|
|
if (WIN32_IZE(lseek)(lz4->fd, lz4->init_pos, SEEK_SET) < 0) {
|
|
lz4->err = Z_ERRNO;
|
|
return -1;
|
|
}
|
|
LZ4F_resetDecompressionContext(lz4->dctx);
|
|
lz4->in_pos = 0;
|
|
lz4->in_len = 0;
|
|
lz4->out_len = 0;
|
|
lz4->out_pos = 0;
|
|
lz4->err = 0;
|
|
lz4->stream_pos = 0;
|
|
}
|
|
while ((size_t)offset > lz4->stream_pos) {
|
|
static char buffer[1024];
|
|
size_t amt = (size_t)offset - lz4->stream_pos;
|
|
if (amt > sizeof(buffer)) amt = sizeof(buffer);
|
|
if (glzread_lz4(lz4, buffer, (UINT)amt) < 0)
|
|
return -1;
|
|
}
|
|
return (long)lz4->stream_pos;
|
|
} else {
|
|
lz4File_out *lz4 = glz->lz4_out;
|
|
if (whence == SEEK_CUR)
|
|
offset += (long)lz4->stream_pos;
|
|
if (offset >= 0) {
|
|
while ((size_t)offset > lz4->stream_pos) {
|
|
size_t amt = (size_t)offset - lz4->stream_pos;
|
|
if (amt > 8) amt = 8;
|
|
if (glzwrite_lz4(lz4, "\0\0\0\0\0\0\0\0", (UINT)amt) < 0)
|
|
return -1;
|
|
}
|
|
}
|
|
return (long)lz4->stream_pos;
|
|
}
|
|
default:
|
|
S_error1("S_glzseek", "unexpected compress format ~s", Sinteger(glz->format));
|
|
return -1;
|
|
}
|
|
}
|
|
|
|
INT S_glzgetc(glzFile glz) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
return gzgetc(glz->gz);
|
|
case COMPRESS_LZ4:
|
|
{
|
|
unsigned char buffer[1];
|
|
INT r;
|
|
r = S_glzread(glz, buffer, 1);
|
|
if (r == 1)
|
|
return buffer[0];
|
|
else
|
|
return -1;
|
|
}
|
|
default:
|
|
S_error1("S_glzgetc", "unexpected compress format ~s", Sinteger(glz->format));
|
|
return -1;
|
|
}
|
|
}
|
|
|
|
INT S_glzungetc(INT c, glzFile glz) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
return gzungetc(c, glz->gz);
|
|
case COMPRESS_LZ4:
|
|
{
|
|
lz4File_in *lz4 = glz->lz4_in;
|
|
if (lz4->out_len == 0)
|
|
lz4->out_len = lz4->out_pos = 1;
|
|
if (lz4->out_pos) {
|
|
lz4->out_pos--;
|
|
lz4->out_buffer[lz4->out_pos] = c;
|
|
lz4->stream_pos--;
|
|
return c;
|
|
} else {
|
|
/* support ungetc only just after a getc, in which case there
|
|
should have been room */
|
|
return -1;
|
|
}
|
|
}
|
|
default:
|
|
S_error1("S_glzungetc", "unexpected compress format ~s", Sinteger(glz->format));
|
|
return -1;
|
|
}
|
|
}
|
|
|
|
INT S_glzrewind(glzFile glz) {
|
|
return S_glzseek(glz, 0, SEEK_SET);
|
|
}
|
|
|
|
void S_glzerror(glzFile glz, INT *errnum) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
(void)gzerror(glz->gz, errnum);
|
|
break;
|
|
case COMPRESS_LZ4:
|
|
if (glz->inputp)
|
|
*errnum = glz->lz4_in->err;
|
|
else
|
|
*errnum = glz->lz4_out->err;
|
|
break;
|
|
default:
|
|
S_error1("S_glzerror", "unexpected compress format ~s", Sinteger(glz->format));
|
|
*errnum = 0;
|
|
}
|
|
}
|
|
|
|
void S_glzclearerr(glzFile glz) {
|
|
switch (glz->format) {
|
|
case COMPRESS_GZIP:
|
|
gzclearerr(glz->gz);
|
|
break;
|
|
case COMPRESS_LZ4:
|
|
if (glz->inputp)
|
|
glz->lz4_in->err = 0;
|
|
else
|
|
glz->lz4_out->err = 0;
|
|
break;
|
|
default:
|
|
S_error1("S_glzerror", "unexpected compress format ~s", Sinteger(glz->format));
|
|
}
|
|
}
|