this repo has no description
0
fork

Configure Feed

Select the types of activity you want to include in your feed.

Updated stb-image from v1.46 to v2.08. Many improvements!

+2494 -672
+2494 -672
src/externals/stb_image/stb_image.h
··· 1 - /* stb_image - v1.46 - public domain JPEG/PNG reader - http://nothings.org/stb_image.c 2 - when you control the images you're loading 1 + /* stb_image - v2.08 - public domain image loader - http://nothings.org/stb_image.h 3 2 no warranty implied; use at your own risk 4 3 5 4 Do this: 6 5 #define STB_IMAGE_IMPLEMENTATION 7 6 before you include this file in *one* C or C++ file to create the implementation. 8 7 9 - #define STBI_ASSERT(x) to avoid using assert.h. 8 + // i.e. it should look like this: 9 + #include ... 10 + #include ... 11 + #include ... 12 + #define STB_IMAGE_IMPLEMENTATION 13 + #include "stb_image.h" 14 + 15 + You can #define STBI_ASSERT(x) before the #include to avoid using assert.h. 16 + And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free 17 + 10 18 11 19 QUICK NOTES: 12 20 Primarily of interest to game developers and other people who can 13 21 avoid problematic images and only need the trivial interface 14 22 15 - JPEG baseline (no JPEG progressive) 16 - PNG 8-bit-per-channel only 23 + JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib) 24 + PNG 1/2/4/8-bit-per-channel (16 bpc not supported) 17 25 18 26 TGA (not sure what subset, if a subset) 19 27 BMP non-1bpp, non-RLE 20 - PSD (composited view only, no extra channels) 28 + PSD (composited view only, no extra channels, 8/16 bit-per-channel) 21 29 22 30 GIF (*comp always reports as 4-channel) 23 31 HDR (radiance rgbE format) 24 32 PIC (Softimage PIC) 33 + PNM (PPM and PGM binary only) 34 + 35 + Animated GIF still needs a proper API, but here's one way to do it: 36 + http://gist.github.com/urraka/685d9a6340b26b830d49 25 37 26 38 - decode from memory or through FILE (define STBI_NO_STDIO to remove code) 27 39 - decode from arbitrary I/O callbacks 28 - - overridable dequantizing-IDCT, YCbCr-to-RGB conversion (define STBI_SIMD) 40 + - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON) 41 + 42 + Full documentation under "DOCUMENTATION" below. 43 + 44 + 45 + Revision 2.00 release notes: 46 + 47 + - Progressive JPEG is now supported. 48 + 49 + - PPM and PGM binary formats are now supported, thanks to Ken Miller. 50 + 51 + - x86 platforms now make use of SSE2 SIMD instructions for 52 + JPEG decoding, and ARM platforms can use NEON SIMD if requested. 53 + This work was done by Fabian "ryg" Giesen. SSE2 is used by 54 + default, but NEON must be enabled explicitly; see docs. 55 + 56 + With other JPEG optimizations included in this version, we see 57 + 2x speedup on a JPEG on an x86 machine, and a 1.5x speedup 58 + on a JPEG on an ARM machine, relative to previous versions of this 59 + library. The same results will not obtain for all JPGs and for all 60 + x86/ARM machines. (Note that progressive JPEGs are significantly 61 + slower to decode than regular JPEGs.) This doesn't mean that this 62 + is the fastest JPEG decoder in the land; rather, it brings it 63 + closer to parity with standard libraries. If you want the fastest 64 + decode, look elsewhere. (See "Philosophy" section of docs below.) 65 + 66 + See final bullet items below for more info on SIMD. 67 + 68 + - Added STBI_MALLOC, STBI_REALLOC, and STBI_FREE macros for replacing 69 + the memory allocator. Unlike other STBI libraries, these macros don't 70 + support a context parameter, so if you need to pass a context in to 71 + the allocator, you'll have to store it in a global or a thread-local 72 + variable. 29 73 30 - Latest revisions: 31 - 1.46 (2014-08-26) fix broken tRNS chunk in non-paletted PNG 32 - 1.45 (2014-08-16) workaround MSVC-ARM internal compiler error by wrapping malloc 33 - 1.44 (2014-08-07) warnings 34 - 1.43 (2014-07-15) fix MSVC-only bug in 1.42 35 - 1.42 (2014-07-09) no _CRT_SECURE_NO_WARNINGS; error-path fixes; STBI_ASSERT 36 - 1.41 (2014-06-25) fix search&replace that messed up comments/error messages 37 - 1.40 (2014-06-22) gcc warning 38 - 1.39 (2014-06-15) TGA optimization bugfix, multiple BMP fixes 39 - 1.38 (2014-06-06) suppress MSVC run-time warnings, fix accidental rename of 'skip' 40 - 1.37 (2014-06-04) remove duplicate typedef 41 - 1.36 (2014-06-03) converted to header file, allow reading incorrect iphoned-images without iphone flag 42 - 1.35 (2014-05-27) warnings, bugfixes, TGA optimization, etc 74 + - Split existing STBI_NO_HDR flag into two flags, STBI_NO_HDR and 75 + STBI_NO_LINEAR. 76 + STBI_NO_HDR: suppress implementation of .hdr reader format 77 + STBI_NO_LINEAR: suppress high-dynamic-range light-linear float API 43 78 44 - See end of file for full revision history. 79 + - You can suppress implementation of any of the decoders to reduce 80 + your code footprint by #defining one or more of the following 81 + symbols before creating the implementation. 45 82 46 - TODO: 47 - stbi_info support for BMP,PSD,HDR,PIC 83 + STBI_NO_JPEG 84 + STBI_NO_PNG 85 + STBI_NO_BMP 86 + STBI_NO_PSD 87 + STBI_NO_TGA 88 + STBI_NO_GIF 89 + STBI_NO_HDR 90 + STBI_NO_PIC 91 + STBI_NO_PNM (.ppm and .pgm) 92 + 93 + - You can request *only* certain decoders and suppress all other ones 94 + (this will be more forward-compatible, as addition of new decoders 95 + doesn't require you to disable them explicitly): 96 + 97 + STBI_ONLY_JPEG 98 + STBI_ONLY_PNG 99 + STBI_ONLY_BMP 100 + STBI_ONLY_PSD 101 + STBI_ONLY_TGA 102 + STBI_ONLY_GIF 103 + STBI_ONLY_HDR 104 + STBI_ONLY_PIC 105 + STBI_ONLY_PNM (.ppm and .pgm) 106 + 107 + Note that you can define multiples of these, and you will get all 108 + of them ("only x" and "only y" is interpreted to mean "only x&y"). 109 + 110 + - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still 111 + want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB 112 + 113 + - Compilation of all SIMD code can be suppressed with 114 + #define STBI_NO_SIMD 115 + It should not be necessary to disable SIMD unless you have issues 116 + compiling (e.g. using an x86 compiler which doesn't support SSE 117 + intrinsics or that doesn't support the method used to detect 118 + SSE2 support at run-time), and even those can be reported as 119 + bugs so I can refine the built-in compile-time checking to be 120 + smarter. 121 + 122 + - The old STBI_SIMD system which allowed installing a user-defined 123 + IDCT etc. has been removed. If you need this, don't upgrade. My 124 + assumption is that almost nobody was doing this, and those who 125 + were will find the built-in SIMD more satisfactory anyway. 126 + 127 + - RGB values computed for JPEG images are slightly different from 128 + previous versions of stb_image. (This is due to using less 129 + integer precision in SIMD.) The C code has been adjusted so 130 + that the same RGB values will be computed regardless of whether 131 + SIMD support is available, so your app should always produce 132 + consistent results. But these results are slightly different from 133 + previous versions. (Specifically, about 3% of available YCbCr values 134 + will compute different RGB results from pre-1.49 versions by +-1; 135 + most of the deviating values are one smaller in the G channel.) 136 + 137 + - If you must produce consistent results with previous versions of 138 + stb_image, #define STBI_JPEG_OLD and you will get the same results 139 + you used to; however, you will not get the SIMD speedups for 140 + the YCbCr-to-RGB conversion step (although you should still see 141 + significant JPEG speedup from the other changes). 142 + 143 + Please note that STBI_JPEG_OLD is a temporary feature; it will be 144 + removed in future versions of the library. It is only intended for 145 + near-term back-compatibility use. 146 + 147 + 148 + Latest revision history: 149 + 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA 150 + 2.07 (2015-09-13) partial animated GIF support 151 + limited 16-bit PSD support 152 + minor bugs, code cleanup, and compiler warnings 153 + 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value 154 + 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning 155 + 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit 156 + 2.03 (2015-04-12) additional corruption checking 157 + stbi_set_flip_vertically_on_load 158 + fix NEON support; fix mingw support 159 + 2.02 (2015-01-19) fix incorrect assert, fix warning 160 + 2.01 (2015-01-17) fix various warnings 161 + 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG 162 + 2.00 (2014-12-25) optimize JPEG, including x86 SSE2 & ARM NEON SIMD 163 + progressive JPEG 164 + PGM/PPM support 165 + STBI_MALLOC,STBI_REALLOC,STBI_FREE 166 + STBI_NO_*, STBI_ONLY_* 167 + GIF bugfix 168 + 1.48 (2014-12-14) fix incorrectly-named assert() 169 + 1.47 (2014-12-14) 1/2/4-bit PNG support (both grayscale and paletted) 170 + optimize PNG 171 + fix bug in interlaced PNG with user-specified channel count 172 + 173 + See end of file for full revision history. 48 174 49 175 50 176 ============================ Contributors ========================= 51 - 177 + 52 178 Image formats Bug fixes & warning fixes 53 179 Sean Barrett (jpeg, png, bmp) Marc LeBlanc 54 180 Nicolas Schulz (hdr, psd) Christpher Lloyd ··· 56 182 Jean-Marc Lienher (gif) Won Chun 57 183 Tom Seddon (pic) the Horde3D community 58 184 Thatcher Ulrich (psd) Janez Zemva 59 - Jonathan Blow 60 - Laurent Gomila 61 - Extensions, features Aruelien Pocheville 62 - Jetro Lauha (stbi_info) Ryamond Barbiero 63 - James "moose2000" Brown (iPhone PNG) David Woo 64 - Ben "Disch" Wenger (io callbacks) Roy Eltham 65 - Martin "SpartanJ" Golini Luke Graham 66 - Thomas Ruf 67 - John Bartholomew 68 - Optimizations & bugfixes Ken Hamada 69 - Fabian "ryg" Giesen Cort Stratton 70 - Arseny Kapoulkine Blazej Dariusz Roszkowski 185 + Ken Miller (pgm, ppm) Jonathan Blow 186 + urraka@github (animated gif) Laurent Gomila 187 + Aruelien Pocheville 188 + Ryamond Barbiero 189 + David Woo 190 + Extensions, features Martin Golini 191 + Jetro Lauha (stbi_info) Roy Eltham 192 + Martin "SpartanJ" Golini (stbi_info) Luke Graham 193 + James "moose2000" Brown (iPhone PNG) Thomas Ruf 194 + Ben "Disch" Wenger (io callbacks) John Bartholomew 195 + Omar Cornut (1/2/4-bit PNG) Ken Hamada 196 + Nicolas Guillemot (vertical flip) Cort Stratton 197 + Richard Mitton (16-bit PSD) Blazej Dariusz Roszkowski 71 198 Thibault Reuille 72 199 Paul Du Bois 73 200 Guillaume George 74 201 Jerry Jansson 75 - If your name should be here but Hayaki Saito 76 - isn't, let Sean know. Johan Duparc 202 + Hayaki Saito 203 + Johan Duparc 77 204 Ronny Chevalier 78 - Michal Cichon 205 + Optimizations & bugfixes Michal Cichon 206 + Fabian "ryg" Giesen Tero Hanninen 207 + Arseny Kapoulkine Sergio Gonzalez 208 + Cass Everitt 209 + Engin Manap 210 + If your name should be here but Martins Mozeiko 211 + isn't, let Sean know. Joseph Thomson 212 + Phil Jordan 213 + Nathan Reed 214 + Michaelangel007@github 215 + Nick Verigakis 216 + 217 + LICENSE 218 + 219 + This software is in the public domain. Where that dedication is not 220 + recognized, you are granted a perpetual, irrevocable license to copy, 221 + distribute, and modify this file as you see fit. 222 + 79 223 */ 80 224 81 225 #ifndef STBI_INCLUDE_STB_IMAGE_H 82 226 #define STBI_INCLUDE_STB_IMAGE_H 83 227 228 + // DOCUMENTATION 229 + // 84 230 // Limitations: 85 - // - no jpeg progressive support 86 - // - non-HDR formats support 8-bit samples only (jpeg, png) 87 - // - no delayed line count (jpeg) -- IJG doesn't support either 231 + // - no 16-bit-per-channel PNG 232 + // - no 12-bit-per-channel JPEG 233 + // - no JPEGs with arithmetic coding 88 234 // - no 1-bit BMP 89 235 // - GIF always returns *comp=4 90 236 // 91 - // Basic usage (see HDR discussion below): 237 + // Basic usage (see HDR discussion below for HDR usage): 92 238 // int x,y,n; 93 239 // unsigned char *data = stbi_load(filename, &x, &y, &n, 0); 94 - // // ... process data if not NULL ... 240 + // // ... process data if not NULL ... 95 241 // // ... x = width, y = height, n = # 8-bit components per pixel ... 96 242 // // ... replace '0' with '1'..'4' to force that many components per pixel 97 243 // // ... but 'n' will always be the number that it would have been if you said 0 ··· 104 250 // int req_comp -- if non-zero, # of image components requested in result 105 251 // 106 252 // The return value from an image loader is an 'unsigned char *' which points 107 - // to the pixel data. The pixel data consists of *y scanlines of *x pixels, 253 + // to the pixel data, or NULL on an allocation failure or if the image is 254 + // corrupt or invalid. The pixel data consists of *y scanlines of *x pixels, 108 255 // with each pixel consisting of N interleaved 8-bit components; the first 109 256 // pixel pointed to is top-left-most in the image. There is no padding between 110 257 // image scanlines or between pixels, regardless of format. The number of 111 258 // components N is 'req_comp' if req_comp is non-zero, or *comp otherwise. 112 259 // If req_comp is non-zero, *comp has the number of components that _would_ 113 260 // have been output otherwise. E.g. if you set req_comp to 4, you will always 114 - // get RGBA output, but you can check *comp to easily see if it's opaque. 261 + // get RGBA output, but you can check *comp to see if it's trivially opaque 262 + // because e.g. there were only 3 channels in the source image. 115 263 // 116 264 // An output image with N components has the following components interleaved 117 265 // in this order in each pixel: ··· 133 281 // 134 282 // =========================================================================== 135 283 // 136 - // iPhone PNG support: 284 + // Philosophy 285 + // 286 + // stb libraries are designed with the following priorities: 287 + // 288 + // 1. easy to use 289 + // 2. easy to maintain 290 + // 3. good performance 291 + // 292 + // Sometimes I let "good performance" creep up in priority over "easy to maintain", 293 + // and for best performance I may provide less-easy-to-use APIs that give higher 294 + // performance, in addition to the easy to use ones. Nevertheless, it's important 295 + // to keep in mind that from the standpoint of you, a client of this library, 296 + // all you care about is #1 and #3, and stb libraries do not emphasize #3 above all. 297 + // 298 + // Some secondary priorities arise directly from the first two, some of which 299 + // make more explicit reasons why performance can't be emphasized. 300 + // 301 + // - Portable ("ease of use") 302 + // - Small footprint ("easy to maintain") 303 + // - No dependencies ("ease of use") 304 + // 305 + // =========================================================================== 137 306 // 138 - // By default we convert iphone-formatted PNGs back to RGB; nominally they 139 - // would silently load as BGR, except the existing code should have just 140 - // failed on such iPhone PNGs. But you can disable this conversion by 141 - // by calling stbi_convert_iphone_png_to_rgb(0), in which case 142 - // you will always just get the native iphone "format" through. 307 + // I/O callbacks 308 + // 309 + // I/O callbacks allow you to read from arbitrary sources, like packaged 310 + // files or some other source. Data read from callbacks are processed 311 + // through a small internal buffer (currently 128 bytes) to try to reduce 312 + // overhead. 143 313 // 144 - // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per 145 - // pixel to remove any premultiplied alpha *only* if the image file explicitly 146 - // says there's premultiplied data (currently only happens in iPhone images, 147 - // and only if iPhone convert-to-rgb processing is on). 314 + // The three functions you must define are "read" (reads some bytes of data), 315 + // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end). 316 + // 317 + // =========================================================================== 318 + // 319 + // SIMD support 320 + // 321 + // The JPEG decoder will try to automatically use SIMD kernels on x86 when 322 + // supported by the compiler. For ARM Neon support, you must explicitly 323 + // request it. 324 + // 325 + // (The old do-it-yourself SIMD API is no longer supported in the current 326 + // code.) 327 + // 328 + // On x86, SSE2 will automatically be used when available based on a run-time 329 + // test; if not, the generic C versions are used as a fall-back. On ARM targets, 330 + // the typical path is to have separate builds for NEON and non-NEON devices 331 + // (at least this is true for iOS and Android). Therefore, the NEON support is 332 + // toggled by a build flag: define STBI_NEON to get NEON loops. 333 + // 334 + // The output of the JPEG decoder is slightly different from versions where 335 + // SIMD support was introduced (that is, for versions before 1.49). The 336 + // difference is only +-1 in the 8-bit RGB channels, and only on a small 337 + // fraction of pixels. You can force the pre-1.49 behavior by defining 338 + // STBI_JPEG_OLD, but this will disable some of the SIMD decoding path 339 + // and hence cost some performance. 340 + // 341 + // If for some reason you do not want to use any of SIMD code, or if 342 + // you have issues compiling it, you can disable it entirely by 343 + // defining STBI_NO_SIMD. 148 344 // 149 345 // =========================================================================== 150 346 // ··· 167 363 // (linear) floats to preserve the full dynamic range: 168 364 // 169 365 // float *data = stbi_loadf(filename, &x, &y, &n, 0); 170 - // 366 + // 171 367 // If you load LDR images through this interface, those images will 172 368 // be promoted to floating point values, run through the inverse of 173 369 // constants corresponding to the above: ··· 184 380 // 185 381 // =========================================================================== 186 382 // 187 - // I/O callbacks 383 + // iPhone PNG support: 188 384 // 189 - // I/O callbacks allow you to read from arbitrary sources, like packaged 190 - // files or some other source. Data read from callbacks are processed 191 - // through a small internal buffer (currently 128 bytes) to try to reduce 192 - // overhead. 385 + // By default we convert iphone-formatted PNGs back to RGB, even though 386 + // they are internally encoded differently. You can disable this conversion 387 + // by by calling stbi_convert_iphone_png_to_rgb(0), in which case 388 + // you will always just get the native iphone "format" through (which 389 + // is BGR stored in RGB). 193 390 // 194 - // The three functions you must define are "read" (reads some bytes of data), 195 - // "skip" (skips some bytes of data), "eof" (reports if the stream is at the end). 391 + // Call stbi_set_unpremultiply_on_load(1) as well to force a divide per 392 + // pixel to remove any premultiplied alpha *only* if the image file explicitly 393 + // says there's premultiplied data (currently only happens in iPhone images, 394 + // and only if iPhone convert-to-rgb processing is on). 395 + // 196 396 197 397 198 398 #ifndef STBI_NO_STDIO ··· 232 432 // load image by filename, open file, or memory buffer 233 433 // 234 434 235 - STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp); 236 - 237 - #ifndef STBI_NO_STDIO 238 - STBIDEF stbi_uc *stbi_load (char const *filename, int *x, int *y, int *comp, int req_comp); 239 - STBIDEF stbi_uc *stbi_load_from_file (FILE *f, int *x, int *y, int *comp, int req_comp); 240 - // for stbi_load_from_file, file pointer is left pointing immediately after image 241 - #endif 242 - 243 435 typedef struct 244 436 { 245 - int (*read) (void *user,char *data,int size); // fill 'data' with 'size' bytes. return number of bytes actually read 437 + int (*read) (void *user,char *data,int size); // fill 'data' with 'size' bytes. return number of bytes actually read 246 438 void (*skip) (void *user,int n); // skip the next 'n' bytes, or 'unget' the last -n bytes if negative 247 439 int (*eof) (void *user); // returns nonzero if we are at end of file/data 248 440 } stbi_io_callbacks; 249 441 250 - STBIDEF stbi_uc *stbi_load_from_callbacks (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp); 442 + STBIDEF stbi_uc *stbi_load (char const *filename, int *x, int *y, int *comp, int req_comp); 443 + STBIDEF stbi_uc *stbi_load_from_memory (stbi_uc const *buffer, int len , int *x, int *y, int *comp, int req_comp); 444 + STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk , void *user, int *x, int *y, int *comp, int req_comp); 445 + 446 + #ifndef STBI_NO_STDIO 447 + STBIDEF stbi_uc *stbi_load_from_file (FILE *f, int *x, int *y, int *comp, int req_comp); 448 + // for stbi_load_from_file, file pointer is left pointing immediately after image 449 + #endif 251 450 252 - #ifndef STBI_NO_HDR 253 - STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp); 451 + #ifndef STBI_NO_LINEAR 452 + STBIDEF float *stbi_loadf (char const *filename, int *x, int *y, int *comp, int req_comp); 453 + STBIDEF float *stbi_loadf_from_memory (stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp); 454 + STBIDEF float *stbi_loadf_from_callbacks (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp); 254 455 255 456 #ifndef STBI_NO_STDIO 256 - STBIDEF float *stbi_loadf (char const *filename, int *x, int *y, int *comp, int req_comp); 257 457 STBIDEF float *stbi_loadf_from_file (FILE *f, int *x, int *y, int *comp, int req_comp); 258 458 #endif 259 - 260 - STBIDEF float *stbi_loadf_from_callbacks (stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp); 459 + #endif 261 460 461 + #ifndef STBI_NO_HDR 262 462 STBIDEF void stbi_hdr_to_ldr_gamma(float gamma); 263 463 STBIDEF void stbi_hdr_to_ldr_scale(float scale); 464 + #endif 264 465 466 + #ifndef STBI_NO_LINEAR 265 467 STBIDEF void stbi_ldr_to_hdr_gamma(float gamma); 266 468 STBIDEF void stbi_ldr_to_hdr_scale(float scale); 267 469 #endif // STBI_NO_HDR 268 470 269 - // stbi_is_hdr is always defined 471 + // stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR 270 472 STBIDEF int stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user); 271 473 STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len); 272 474 #ifndef STBI_NO_STDIO ··· 277 479 278 480 // get a VERY brief reason for failure 279 481 // NOT THREADSAFE 280 - STBIDEF const char *stbi_failure_reason (void); 482 + STBIDEF const char *stbi_failure_reason (void); 281 483 282 484 // free the loaded image -- this is just free() 283 485 STBIDEF void stbi_image_free (void *retval_from_stbi_load); ··· 303 505 // or just pass them through "as-is" 304 506 STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert); 305 507 508 + // flip the image vertically, so the first pixel in the output array is the bottom left 509 + STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip); 306 510 307 511 // ZLIB client - used by PNG, available for other purposes 308 512 ··· 315 519 STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen); 316 520 317 521 318 - // define faster low-level operations (typically SIMD support) 319 - #ifdef STBI_SIMD 320 - typedef void (*stbi_idct_8x8)(stbi_uc *out, int out_stride, short data[64], unsigned short *dequantize); 321 - // compute an integer IDCT on "input" 322 - // input[x] = data[x] * dequantize[x] 323 - // write results to 'out': 64 samples, each run of 8 spaced by 'out_stride' 324 - // CLAMP results to 0..255 325 - typedef void (*stbi_YCbCr_to_RGB_run)(stbi_uc *output, stbi_uc const *y, stbi_uc const *cb, stbi_uc const *cr, int count, int step); 326 - // compute a conversion from YCbCr to RGB 327 - // 'count' pixels 328 - // write pixels to 'output'; each pixel is 'step' bytes (either 3 or 4; if 4, write '255' as 4th), order R,G,B 329 - // y: Y input channel 330 - // cb: Cb input channel; scale/biased to be 0..255 331 - // cr: Cr input channel; scale/biased to be 0..255 332 - 333 - STBIDEF void stbi_install_idct(stbi_idct_8x8 func); 334 - STBIDEF void stbi_install_YCbCr_to_RGB(stbi_YCbCr_to_RGB_run func); 335 - #endif // STBI_SIMD 336 - 337 - 338 522 #ifdef __cplusplus 339 523 } 340 524 #endif ··· 346 530 347 531 #ifdef STB_IMAGE_IMPLEMENTATION 348 532 349 - #ifndef STBI_NO_HDR 533 + #if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \ 534 + || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \ 535 + || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \ 536 + || defined(STBI_ONLY_ZLIB) 537 + #ifndef STBI_ONLY_JPEG 538 + #define STBI_NO_JPEG 539 + #endif 540 + #ifndef STBI_ONLY_PNG 541 + #define STBI_NO_PNG 542 + #endif 543 + #ifndef STBI_ONLY_BMP 544 + #define STBI_NO_BMP 545 + #endif 546 + #ifndef STBI_ONLY_PSD 547 + #define STBI_NO_PSD 548 + #endif 549 + #ifndef STBI_ONLY_TGA 550 + #define STBI_NO_TGA 551 + #endif 552 + #ifndef STBI_ONLY_GIF 553 + #define STBI_NO_GIF 554 + #endif 555 + #ifndef STBI_ONLY_HDR 556 + #define STBI_NO_HDR 557 + #endif 558 + #ifndef STBI_ONLY_PIC 559 + #define STBI_NO_PIC 560 + #endif 561 + #ifndef STBI_ONLY_PNM 562 + #define STBI_NO_PNM 563 + #endif 564 + #endif 565 + 566 + #if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB) 567 + #define STBI_NO_ZLIB 568 + #endif 569 + 570 + 571 + #include <stdarg.h> 572 + #include <stddef.h> // ptrdiff_t on osx 573 + #include <stdlib.h> 574 + #include <string.h> 575 + 576 + #if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) 350 577 #include <math.h> // ldexp 351 - #include <string.h> // strcmp, strtok 352 578 #endif 353 579 354 580 #ifndef STBI_NO_STDIO 355 581 #include <stdio.h> 356 582 #endif 357 - #include <stdlib.h> 358 - #include <string.h> 583 + 359 584 #ifndef STBI_ASSERT 360 585 #include <assert.h> 361 586 #define STBI_ASSERT(x) assert(x) 362 587 #endif 363 - #include <stdarg.h> 364 - #include <stddef.h> // ptrdiff_t on osx 588 + 365 589 366 590 #ifndef _MSC_VER 367 591 #ifdef __cplusplus ··· 406 630 #define stbi_lrot(x,y) (((x) << (y)) | ((x) >> (32 - (y)))) 407 631 #endif 408 632 633 + #if defined(STBI_MALLOC) && defined(STBI_FREE) && defined(STBI_REALLOC) 634 + // ok 635 + #elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) 636 + // ok 637 + #else 638 + #error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC." 639 + #endif 640 + 641 + #ifndef STBI_MALLOC 642 + #define STBI_MALLOC(sz) malloc(sz) 643 + #define STBI_REALLOC(p,sz) realloc(p,sz) 644 + #define STBI_FREE(p) free(p) 645 + #endif 646 + 647 + // x86/x64 detection 648 + #if defined(__x86_64__) || defined(_M_X64) 649 + #define STBI__X64_TARGET 650 + #elif defined(__i386) || defined(_M_IX86) 651 + #define STBI__X86_TARGET 652 + #endif 653 + 654 + #if defined(__GNUC__) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET)) && !defined(__SSE2__) && !defined(STBI_NO_SIMD) 655 + // NOTE: not clear do we actually need this for the 64-bit path? 656 + // gcc doesn't support sse2 intrinsics unless you compile with -msse2, 657 + // (but compiling with -msse2 allows the compiler to use SSE2 everywhere; 658 + // this is just broken and gcc are jerks for not fixing it properly 659 + // http://www.virtualdub.org/blog/pivot/entry.php?id=363 ) 660 + #define STBI_NO_SIMD 661 + #endif 662 + 663 + #if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD) 664 + // Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET 665 + // 666 + // 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the 667 + // Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant. 668 + // As a result, enabling SSE2 on 32-bit MinGW is dangerous when not 669 + // simultaneously enabling "-mstackrealign". 670 + // 671 + // See https://github.com/nothings/stb/issues/81 for more information. 672 + // 673 + // So default to no SSE2 on 32-bit MinGW. If you've read this far and added 674 + // -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2. 675 + #define STBI_NO_SIMD 676 + #endif 677 + 678 + #if !defined(STBI_NO_SIMD) && defined(STBI__X86_TARGET) 679 + #define STBI_SSE2 680 + #include <emmintrin.h> 681 + 682 + #ifdef _MSC_VER 683 + 684 + #if _MSC_VER >= 1400 // not VC6 685 + #include <intrin.h> // __cpuid 686 + static int stbi__cpuid3(void) 687 + { 688 + int info[4]; 689 + __cpuid(info,1); 690 + return info[3]; 691 + } 692 + #else 693 + static int stbi__cpuid3(void) 694 + { 695 + int res; 696 + __asm { 697 + mov eax,1 698 + cpuid 699 + mov res,edx 700 + } 701 + return res; 702 + } 703 + #endif 704 + 705 + #define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name 706 + 707 + static int stbi__sse2_available() 708 + { 709 + int info3 = stbi__cpuid3(); 710 + return ((info3 >> 26) & 1) != 0; 711 + } 712 + #else // assume GCC-style if not VC++ 713 + #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16))) 714 + 715 + static int stbi__sse2_available() 716 + { 717 + #if defined(__GNUC__) && (__GNUC__ * 100 + __GNUC_MINOR__) >= 408 // GCC 4.8 or later 718 + // GCC 4.8+ has a nice way to do this 719 + return __builtin_cpu_supports("sse2"); 720 + #else 721 + // portable way to do this, preferably without using GCC inline ASM? 722 + // just bail for now. 723 + return 0; 724 + #endif 725 + } 726 + #endif 727 + #endif 728 + 729 + // ARM NEON 730 + #if defined(STBI_NO_SIMD) && defined(STBI_NEON) 731 + #undef STBI_NEON 732 + #endif 733 + 734 + #ifdef STBI_NEON 735 + #include <arm_neon.h> 736 + // assume GCC or Clang on ARM targets 737 + #define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16))) 738 + #endif 739 + 740 + #ifndef STBI_SIMD_ALIGN 741 + #define STBI_SIMD_ALIGN(type, name) type name 742 + #endif 743 + 409 744 /////////////////////////////////////////////// 410 745 // 411 746 // stbi__context struct and start_xxx functions ··· 416 751 { 417 752 stbi__uint32 img_x, img_y; 418 753 int img_n, img_out_n; 419 - 754 + 420 755 stbi_io_callbacks io; 421 756 void *io_user_data; 422 757 ··· 425 760 stbi_uc buffer_start[128]; 426 761 427 762 stbi_uc *img_buffer, *img_buffer_end; 428 - stbi_uc *img_buffer_original; 763 + stbi_uc *img_buffer_original, *img_buffer_original_end; 429 764 } stbi__context; 430 765 431 766 ··· 437 772 s->io.read = NULL; 438 773 s->read_from_callbacks = 0; 439 774 s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer; 440 - s->img_buffer_end = (stbi_uc *) buffer+len; 775 + s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len; 441 776 } 442 777 443 778 // initialize a callback-based context ··· 449 784 s->read_from_callbacks = 1; 450 785 s->img_buffer_original = s->buffer_start; 451 786 stbi__refill_buffer(s); 787 + s->img_buffer_original_end = s->img_buffer_end; 452 788 } 453 789 454 790 #ifndef STBI_NO_STDIO ··· 490 826 // but we just rewind to the beginning of the initial buffer, because 491 827 // we only use it after doing 'test', which only ever looks at at most 92 bytes 492 828 s->img_buffer = s->img_buffer_original; 829 + s->img_buffer_end = s->img_buffer_original_end; 493 830 } 494 831 832 + #ifndef STBI_NO_JPEG 495 833 static int stbi__jpeg_test(stbi__context *s); 496 834 static stbi_uc *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 497 835 static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp); 836 + #endif 837 + 838 + #ifndef STBI_NO_PNG 498 839 static int stbi__png_test(stbi__context *s); 499 840 static stbi_uc *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 500 841 static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp); 842 + #endif 843 + 844 + #ifndef STBI_NO_BMP 501 845 static int stbi__bmp_test(stbi__context *s); 502 846 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 847 + static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp); 848 + #endif 849 + 850 + #ifndef STBI_NO_TGA 503 851 static int stbi__tga_test(stbi__context *s); 504 852 static stbi_uc *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 505 853 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp); 854 + #endif 855 + 856 + #ifndef STBI_NO_PSD 506 857 static int stbi__psd_test(stbi__context *s); 507 858 static stbi_uc *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 859 + static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp); 860 + #endif 861 + 508 862 #ifndef STBI_NO_HDR 509 863 static int stbi__hdr_test(stbi__context *s); 510 864 static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 865 + static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp); 511 866 #endif 867 + 868 + #ifndef STBI_NO_PIC 512 869 static int stbi__pic_test(stbi__context *s); 513 870 static stbi_uc *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 871 + static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp); 872 + #endif 873 + 874 + #ifndef STBI_NO_GIF 514 875 static int stbi__gif_test(stbi__context *s); 515 876 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 516 877 static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp); 878 + #endif 517 879 880 + #ifndef STBI_NO_PNM 881 + static int stbi__pnm_test(stbi__context *s); 882 + static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp); 883 + static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp); 884 + #endif 518 885 519 886 // this is not threadsafe 520 887 static const char *stbi__g_failure_reason; ··· 532 899 533 900 static void *stbi__malloc(size_t size) 534 901 { 535 - return malloc(size); 902 + return STBI_MALLOC(size); 536 903 } 537 904 538 905 // stbi__err - error ··· 547 914 #define stbi__err(x,y) stbi__err(x) 548 915 #endif 549 916 550 - #define stbi__errpf(x,y) ((float *) (stbi__err(x,y)?NULL:NULL)) 551 - #define stbi__errpuc(x,y) ((unsigned char *) (stbi__err(x,y)?NULL:NULL)) 917 + #define stbi__errpf(x,y) ((float *)(size_t) (stbi__err(x,y)?NULL:NULL)) 918 + #define stbi__errpuc(x,y) ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL)) 552 919 553 920 STBIDEF void stbi_image_free(void *retval_from_stbi_load) 554 921 { 555 - free(retval_from_stbi_load); 922 + STBI_FREE(retval_from_stbi_load); 556 923 } 557 924 925 + #ifndef STBI_NO_LINEAR 926 + static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp); 927 + #endif 928 + 558 929 #ifndef STBI_NO_HDR 559 - static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp); 560 930 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp); 561 931 #endif 562 932 563 - static unsigned char *stbi_load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) 933 + static int stbi__vertically_flip_on_load = 0; 934 + 935 + STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip) 936 + { 937 + stbi__vertically_flip_on_load = flag_true_if_should_flip; 938 + } 939 + 940 + static unsigned char *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) 564 941 { 942 + #ifndef STBI_NO_JPEG 565 943 if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp); 944 + #endif 945 + #ifndef STBI_NO_PNG 566 946 if (stbi__png_test(s)) return stbi__png_load(s,x,y,comp,req_comp); 947 + #endif 948 + #ifndef STBI_NO_BMP 567 949 if (stbi__bmp_test(s)) return stbi__bmp_load(s,x,y,comp,req_comp); 950 + #endif 951 + #ifndef STBI_NO_GIF 568 952 if (stbi__gif_test(s)) return stbi__gif_load(s,x,y,comp,req_comp); 953 + #endif 954 + #ifndef STBI_NO_PSD 569 955 if (stbi__psd_test(s)) return stbi__psd_load(s,x,y,comp,req_comp); 956 + #endif 957 + #ifndef STBI_NO_PIC 570 958 if (stbi__pic_test(s)) return stbi__pic_load(s,x,y,comp,req_comp); 959 + #endif 960 + #ifndef STBI_NO_PNM 961 + if (stbi__pnm_test(s)) return stbi__pnm_load(s,x,y,comp,req_comp); 962 + #endif 571 963 572 964 #ifndef STBI_NO_HDR 573 965 if (stbi__hdr_test(s)) { ··· 576 968 } 577 969 #endif 578 970 971 + #ifndef STBI_NO_TGA 579 972 // test tga last because it's a crappy test! 580 973 if (stbi__tga_test(s)) 581 974 return stbi__tga_load(s,x,y,comp,req_comp); 975 + #endif 976 + 582 977 return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt"); 583 978 } 584 979 980 + static unsigned char *stbi__load_flip(stbi__context *s, int *x, int *y, int *comp, int req_comp) 981 + { 982 + unsigned char *result = stbi__load_main(s, x, y, comp, req_comp); 983 + 984 + if (stbi__vertically_flip_on_load && result != NULL) { 985 + int w = *x, h = *y; 986 + int depth = req_comp ? req_comp : *comp; 987 + int row,col,z; 988 + stbi_uc temp; 989 + 990 + // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once 991 + for (row = 0; row < (h>>1); row++) { 992 + for (col = 0; col < w; col++) { 993 + for (z = 0; z < depth; z++) { 994 + temp = result[(row * w + col) * depth + z]; 995 + result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z]; 996 + result[((h - row - 1) * w + col) * depth + z] = temp; 997 + } 998 + } 999 + } 1000 + } 1001 + 1002 + return result; 1003 + } 1004 + 1005 + #ifndef STBI_NO_HDR 1006 + static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp) 1007 + { 1008 + if (stbi__vertically_flip_on_load && result != NULL) { 1009 + int w = *x, h = *y; 1010 + int depth = req_comp ? req_comp : *comp; 1011 + int row,col,z; 1012 + float temp; 1013 + 1014 + // @OPTIMIZE: use a bigger temp buffer and memcpy multiple pixels at once 1015 + for (row = 0; row < (h>>1); row++) { 1016 + for (col = 0; col < w; col++) { 1017 + for (z = 0; z < depth; z++) { 1018 + temp = result[(row * w + col) * depth + z]; 1019 + result[(row * w + col) * depth + z] = result[((h - row - 1) * w + col) * depth + z]; 1020 + result[((h - row - 1) * w + col) * depth + z] = temp; 1021 + } 1022 + } 1023 + } 1024 + } 1025 + } 1026 + #endif 1027 + 585 1028 #ifndef STBI_NO_STDIO 586 1029 587 - FILE *stbi__fopen(char const *filename, char const *mode) 1030 + static FILE *stbi__fopen(char const *filename, char const *mode) 588 1031 { 589 1032 FILE *f; 590 1033 #if defined(_MSC_VER) && _MSC_VER >= 1400 ··· 597 1040 } 598 1041 599 1042 600 - STBIDEF unsigned char *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp) 1043 + STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp) 601 1044 { 602 1045 FILE *f = stbi__fopen(filename, "rb"); 603 1046 unsigned char *result; ··· 607 1050 return result; 608 1051 } 609 1052 610 - STBIDEF unsigned char *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp) 1053 + STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp) 611 1054 { 612 1055 unsigned char *result; 613 1056 stbi__context s; 614 1057 stbi__start_file(&s,f); 615 - result = stbi_load_main(&s,x,y,comp,req_comp); 1058 + result = stbi__load_flip(&s,x,y,comp,req_comp); 616 1059 if (result) { 617 1060 // need to 'unget' all the characters in the IO buffer 618 1061 fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR); ··· 621 1064 } 622 1065 #endif //!STBI_NO_STDIO 623 1066 624 - STBIDEF unsigned char *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp) 1067 + STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp) 625 1068 { 626 1069 stbi__context s; 627 1070 stbi__start_mem(&s,buffer,len); 628 - return stbi_load_main(&s,x,y,comp,req_comp); 1071 + return stbi__load_flip(&s,x,y,comp,req_comp); 629 1072 } 630 1073 631 - unsigned char *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp) 1074 + STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp) 632 1075 { 633 1076 stbi__context s; 634 1077 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user); 635 - return stbi_load_main(&s,x,y,comp,req_comp); 1078 + return stbi__load_flip(&s,x,y,comp,req_comp); 636 1079 } 637 1080 638 - #ifndef STBI_NO_HDR 639 - 640 - float *stbi_loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) 1081 + #ifndef STBI_NO_LINEAR 1082 + static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp) 641 1083 { 642 1084 unsigned char *data; 643 1085 #ifndef STBI_NO_HDR 644 - if (stbi__hdr_test(s)) 645 - return stbi__hdr_load(s,x,y,comp,req_comp); 1086 + if (stbi__hdr_test(s)) { 1087 + float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp); 1088 + if (hdr_data) 1089 + stbi__float_postprocess(hdr_data,x,y,comp,req_comp); 1090 + return hdr_data; 1091 + } 646 1092 #endif 647 - data = stbi_load_main(s, x, y, comp, req_comp); 1093 + data = stbi__load_flip(s, x, y, comp, req_comp); 648 1094 if (data) 649 1095 return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp); 650 1096 return stbi__errpf("unknown image type", "Image not of any known type, or corrupt"); 651 1097 } 652 1098 653 - float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp) 1099 + STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp) 654 1100 { 655 1101 stbi__context s; 656 1102 stbi__start_mem(&s,buffer,len); 657 - return stbi_loadf_main(&s,x,y,comp,req_comp); 1103 + return stbi__loadf_main(&s,x,y,comp,req_comp); 658 1104 } 659 1105 660 - float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp) 1106 + STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp) 661 1107 { 662 1108 stbi__context s; 663 1109 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user); 664 - return stbi_loadf_main(&s,x,y,comp,req_comp); 1110 + return stbi__loadf_main(&s,x,y,comp,req_comp); 665 1111 } 666 1112 667 1113 #ifndef STBI_NO_STDIO 668 - float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp) 1114 + STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp) 669 1115 { 670 1116 float *result; 671 1117 FILE *f = stbi__fopen(filename, "rb"); ··· 675 1121 return result; 676 1122 } 677 1123 678 - float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp) 1124 + STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp) 679 1125 { 680 1126 stbi__context s; 681 1127 stbi__start_file(&s,f); 682 - return stbi_loadf_main(&s,x,y,comp,req_comp); 1128 + return stbi__loadf_main(&s,x,y,comp,req_comp); 683 1129 } 684 1130 #endif // !STBI_NO_STDIO 685 1131 686 - #endif // !STBI_NO_HDR 1132 + #endif // !STBI_NO_LINEAR 687 1133 688 - // these is-hdr-or-not is defined independent of whether STBI_NO_HDR is 689 - // defined, for API simplicity; if STBI_NO_HDR is defined, it always 1134 + // these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is 1135 + // defined, for API simplicity; if STBI_NO_LINEAR is defined, it always 690 1136 // reports false! 691 1137 692 - int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len) 1138 + STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len) 693 1139 { 694 1140 #ifndef STBI_NO_HDR 695 1141 stbi__context s; ··· 721 1167 stbi__start_file(&s,f); 722 1168 return stbi__hdr_test(&s); 723 1169 #else 1170 + STBI_NOTUSED(f); 724 1171 return 0; 725 1172 #endif 726 1173 } ··· 733 1180 stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user); 734 1181 return stbi__hdr_test(&s); 735 1182 #else 1183 + STBI_NOTUSED(clbk); 1184 + STBI_NOTUSED(user); 736 1185 return 0; 737 1186 #endif 738 1187 } 739 1188 740 - #ifndef STBI_NO_HDR 741 1189 static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f; 742 1190 static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f; 743 1191 744 - void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; } 745 - void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; } 746 - 747 - void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; } 748 - void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; } 1192 + #ifndef STBI_NO_LINEAR 1193 + STBIDEF void stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; } 1194 + STBIDEF void stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; } 749 1195 #endif 1196 + 1197 + STBIDEF void stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; } 1198 + STBIDEF void stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; } 750 1199 751 1200 752 1201 ////////////////////////////////////////////////////////////////////////////// ··· 756 1205 757 1206 enum 758 1207 { 759 - SCAN_load=0, 760 - SCAN_type, 761 - SCAN_header 1208 + STBI__SCAN_load=0, 1209 + STBI__SCAN_type, 1210 + STBI__SCAN_header 762 1211 }; 763 1212 764 1213 static void stbi__refill_buffer(stbi__context *s) ··· 766 1215 int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen); 767 1216 if (n == 0) { 768 1217 // at end of file, treat same as if from memory, but need to handle case 769 - // where s->img_buffer isn't pointing to safe memory, stbi__err.g. 0-byte file 1218 + // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file 770 1219 s->read_from_callbacks = 0; 771 1220 s->img_buffer = s->buffer_start; 772 1221 s->img_buffer_end = s->buffer_start+1; ··· 797 1246 if (s->read_from_callbacks == 0) return 1; 798 1247 } 799 1248 800 - return s->img_buffer >= s->img_buffer_end; 1249 + return s->img_buffer >= s->img_buffer_end; 801 1250 } 802 1251 803 1252 static void stbi__skip(stbi__context *s, int n) 804 1253 { 1254 + if (n < 0) { 1255 + s->img_buffer = s->img_buffer_end; 1256 + return; 1257 + } 805 1258 if (s->io.read) { 806 1259 int blen = (int) (s->img_buffer_end - s->img_buffer); 807 1260 if (blen < n) { ··· 821 1274 int res, count; 822 1275 823 1276 memcpy(buffer, s->img_buffer, blen); 824 - 1277 + 825 1278 count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen); 826 1279 res = (count == (n-blen)); 827 1280 s->img_buffer = s->img_buffer_end; ··· 849 1302 return (z << 16) + stbi__get16be(s); 850 1303 } 851 1304 1305 + #if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) 1306 + // nothing 1307 + #else 852 1308 static int stbi__get16le(stbi__context *s) 853 1309 { 854 1310 int z = stbi__get8(s); 855 1311 return z + (stbi__get8(s) << 8); 856 1312 } 1313 + #endif 857 1314 1315 + #ifndef STBI_NO_BMP 858 1316 static stbi__uint32 stbi__get32le(stbi__context *s) 859 1317 { 860 1318 stbi__uint32 z = stbi__get16le(s); 861 1319 return z + (stbi__get16le(s) << 16); 862 1320 } 1321 + #endif 1322 + 1323 + #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings 1324 + 863 1325 864 1326 ////////////////////////////////////////////////////////////////////////////// 865 1327 // 866 1328 // generic converter from built-in img_n to req_comp 867 - // individual types do this automatically as much as possible (stbi__err.g. jpeg 1329 + // individual types do this automatically as much as possible (e.g. jpeg 868 1330 // does all cases internally since it needs to colorspace convert anyway, 869 1331 // and it never has alpha, so very few cases ). png can automatically 870 1332 // interleave an alpha=255 channel, but falls back to this for other cases ··· 887 1349 888 1350 good = (unsigned char *) stbi__malloc(req_comp * x * y); 889 1351 if (good == NULL) { 890 - free(data); 1352 + STBI_FREE(data); 891 1353 return stbi__errpuc("outofmem", "Out of memory"); 892 1354 } 893 1355 ··· 917 1379 #undef CASE 918 1380 } 919 1381 920 - free(data); 1382 + STBI_FREE(data); 921 1383 return good; 922 1384 } 923 1385 924 - #ifndef STBI_NO_HDR 1386 + #ifndef STBI_NO_LINEAR 925 1387 static float *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp) 926 1388 { 927 1389 int i,k,n; 928 1390 float *output = (float *) stbi__malloc(x * y * comp * sizeof(float)); 929 - if (output == NULL) { free(data); return stbi__errpf("outofmem", "Out of memory"); } 1391 + if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); } 930 1392 // compute number of non-alpha components 931 1393 if (comp & 1) n = comp; else n = comp-1; 932 1394 for (i=0; i < x*y; ++i) { ··· 935 1397 } 936 1398 if (k < comp) output[i*comp + k] = data[i*comp+k]/255.0f; 937 1399 } 938 - free(data); 1400 + STBI_FREE(data); 939 1401 return output; 940 1402 } 1403 + #endif 941 1404 1405 + #ifndef STBI_NO_HDR 942 1406 #define stbi__float2int(x) ((int) (x)) 943 1407 static stbi_uc *stbi__hdr_to_ldr(float *data, int x, int y, int comp) 944 1408 { 945 1409 int i,k,n; 946 1410 stbi_uc *output = (stbi_uc *) stbi__malloc(x * y * comp); 947 - if (output == NULL) { free(data); return stbi__errpuc("outofmem", "Out of memory"); } 1411 + if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); } 948 1412 // compute number of non-alpha components 949 1413 if (comp & 1) n = comp; else n = comp-1; 950 1414 for (i=0; i < x*y; ++i) { ··· 961 1425 output[i*comp + k] = (stbi_uc) stbi__float2int(z); 962 1426 } 963 1427 } 964 - free(data); 1428 + STBI_FREE(data); 965 1429 return output; 966 1430 } 967 1431 #endif 968 1432 969 1433 ////////////////////////////////////////////////////////////////////////////// 970 1434 // 971 - // "baseline" JPEG/JFIF decoder (not actually fully baseline implementation) 1435 + // "baseline" JPEG/JFIF decoder 972 1436 // 973 1437 // simple implementation 974 - // - channel subsampling of at most 2 in each dimension 975 1438 // - doesn't support delayed output of y-dimension 976 1439 // - simple interface (only one output format: 8-bit interleaved RGB) 977 1440 // - doesn't try to recover corrupt jpegs ··· 985 1448 // - quality integer IDCT derived from IJG's 'slow' 986 1449 // performance 987 1450 // - fast huffman; reasonable integer IDCT 1451 + // - some SIMD kernels for common paths on targets with SSE2/NEON 988 1452 // - uses a lot of intermediate memory, could cache poorly 989 - // - load http://nothings.org/remote/anemones.jpg 3 times on 2.8Ghz P4 990 - // stb_jpeg: 1.34 seconds (MSVC6, default release build) 991 - // stb_jpeg: 1.06 seconds (MSVC6, processor = Pentium Pro) 992 - // IJL11.dll: 1.08 seconds (compiled by intel) 993 - // IJG 1998: 0.98 seconds (MSVC6, makefile provided by IJG) 994 - // IJG 1998: 0.95 seconds (MSVC6, makefile + proc=PPro) 1453 + 1454 + #ifndef STBI_NO_JPEG 995 1455 996 1456 // huffman decoding acceleration 997 1457 #define FAST_BITS 9 // larger handles more cases; smaller stomps less cache ··· 1009 1469 1010 1470 typedef struct 1011 1471 { 1012 - #ifdef STBI_SIMD 1013 - unsigned short dequant2[4][64]; 1014 - #endif 1015 1472 stbi__context *s; 1016 1473 stbi__huffman huff_dc[4]; 1017 1474 stbi__huffman huff_ac[4]; 1018 1475 stbi_uc dequant[4][64]; 1476 + stbi__int16 fast_ac[4][1 << FAST_BITS]; 1019 1477 1020 1478 // sizes for components, interleaved MCUs 1021 1479 int img_h_max, img_v_max; ··· 1033 1491 1034 1492 int x,y,w2,h2; 1035 1493 stbi_uc *data; 1036 - void *raw_data; 1494 + void *raw_data, *raw_coeff; 1037 1495 stbi_uc *linebuf; 1496 + short *coeff; // progressive only 1497 + int coeff_w, coeff_h; // number of 8x8 coefficient blocks 1038 1498 } img_comp[4]; 1039 1499 1040 - stbi__uint32 code_buffer; // jpeg entropy-coded buffer 1500 + stbi__uint32 code_buffer; // jpeg entropy-coded buffer 1041 1501 int code_bits; // number of valid bits 1042 1502 unsigned char marker; // marker seen while filling entropy buffer 1043 1503 int nomore; // flag if we saw a marker so must stop 1044 1504 1505 + int progressive; 1506 + int spec_start; 1507 + int spec_end; 1508 + int succ_high; 1509 + int succ_low; 1510 + int eob_run; 1511 + 1045 1512 int scan_n, order[4]; 1046 1513 int restart_interval, todo; 1514 + 1515 + // kernels 1516 + void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]); 1517 + void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step); 1518 + stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs); 1047 1519 } stbi__jpeg; 1048 1520 1049 1521 static int stbi__build_huffman(stbi__huffman *h, int *count) ··· 1087 1559 return 1; 1088 1560 } 1089 1561 1562 + // build a table that decodes both magnitude and value of small ACs in 1563 + // one go. 1564 + static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h) 1565 + { 1566 + int i; 1567 + for (i=0; i < (1 << FAST_BITS); ++i) { 1568 + stbi_uc fast = h->fast[i]; 1569 + fast_ac[i] = 0; 1570 + if (fast < 255) { 1571 + int rs = h->values[fast]; 1572 + int run = (rs >> 4) & 15; 1573 + int magbits = rs & 15; 1574 + int len = h->size[fast]; 1575 + 1576 + if (magbits && len + magbits <= FAST_BITS) { 1577 + // magnitude code followed by receive_extend code 1578 + int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits); 1579 + int m = 1 << (magbits - 1); 1580 + if (k < m) k += (-1 << magbits) + 1; 1581 + // if the result is small enough, we can fit it in fast_ac table 1582 + if (k >= -128 && k <= 127) 1583 + fast_ac[i] = (stbi__int16) ((k << 8) + (run << 4) + (len + magbits)); 1584 + } 1585 + } 1586 + } 1587 + } 1588 + 1090 1589 static void stbi__grow_buffer_unsafe(stbi__jpeg *j) 1091 1590 { 1092 1591 do { ··· 1157 1656 return h->values[c]; 1158 1657 } 1159 1658 1659 + // bias[n] = (-1<<n) + 1 1660 + static int const stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767}; 1661 + 1160 1662 // combined JPEG 'receive' and JPEG 'extend', since baseline 1161 1663 // always extends everything it receives. 1162 1664 stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n) 1163 1665 { 1164 - unsigned int m = 1 << (n-1); 1165 1666 unsigned int k; 1667 + int sgn; 1166 1668 if (j->code_bits < n) stbi__grow_buffer_unsafe(j); 1167 1669 1168 - #if 1 1670 + sgn = (stbi__int32)j->code_buffer >> 31; // sign bit is always in MSB 1169 1671 k = stbi_lrot(j->code_buffer, n); 1672 + STBI_ASSERT(n >= 0 && n < (int) (sizeof(stbi__bmask)/sizeof(*stbi__bmask))); 1170 1673 j->code_buffer = k & ~stbi__bmask[n]; 1171 1674 k &= stbi__bmask[n]; 1172 1675 j->code_bits -= n; 1173 - #else 1174 - k = (j->code_buffer >> (32 - n)) & stbi__bmask[n]; 1676 + return k + (stbi__jbias[n] & ~sgn); 1677 + } 1678 + 1679 + // get some unsigned bits 1680 + stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n) 1681 + { 1682 + unsigned int k; 1683 + if (j->code_bits < n) stbi__grow_buffer_unsafe(j); 1684 + k = stbi_lrot(j->code_buffer, n); 1685 + j->code_buffer = k & ~stbi__bmask[n]; 1686 + k &= stbi__bmask[n]; 1175 1687 j->code_bits -= n; 1176 - j->code_buffer <<= n; 1177 - #endif 1178 - // the following test is probably a random branch that won't 1179 - // predict well. I tried to table accelerate it but failed. 1180 - // maybe it's compiling as a conditional move? 1181 - if (k < m) 1182 - return (-1 << n) + k + 1; 1183 - else 1184 - return k; 1688 + return k; 1689 + } 1690 + 1691 + stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j) 1692 + { 1693 + unsigned int k; 1694 + if (j->code_bits < 1) stbi__grow_buffer_unsafe(j); 1695 + k = j->code_buffer; 1696 + j->code_buffer <<= 1; 1697 + --j->code_bits; 1698 + return k & 0x80000000; 1185 1699 } 1186 1700 1187 1701 // given a value that's at position X in the zigzag stream, ··· 1202 1716 }; 1203 1717 1204 1718 // decode one 64-entry block-- 1205 - static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, int b) 1719 + static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi_uc *dequant) 1206 1720 { 1207 1721 int diff,dc,k; 1208 - int t = stbi__jpeg_huff_decode(j, hdc); 1722 + int t; 1723 + 1724 + if (j->code_bits < 16) stbi__grow_buffer_unsafe(j); 1725 + t = stbi__jpeg_huff_decode(j, hdc); 1209 1726 if (t < 0) return stbi__err("bad huffman code","Corrupt JPEG"); 1210 1727 1211 1728 // 0 all the ac values now so we can do it 32-bits at a time ··· 1214 1731 diff = t ? stbi__extend_receive(j, t) : 0; 1215 1732 dc = j->img_comp[b].dc_pred + diff; 1216 1733 j->img_comp[b].dc_pred = dc; 1217 - data[0] = (short) dc; 1734 + data[0] = (short) (dc * dequant[0]); 1218 1735 1219 1736 // decode AC components, see JPEG spec 1220 1737 k = 1; 1221 1738 do { 1222 - int r,s; 1223 - int rs = stbi__jpeg_huff_decode(j, hac); 1224 - if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG"); 1225 - s = rs & 15; 1226 - r = rs >> 4; 1227 - if (s == 0) { 1228 - if (rs != 0xf0) break; // end block 1229 - k += 16; 1230 - } else { 1231 - k += r; 1739 + unsigned int zig; 1740 + int c,r,s; 1741 + if (j->code_bits < 16) stbi__grow_buffer_unsafe(j); 1742 + c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1); 1743 + r = fac[c]; 1744 + if (r) { // fast-AC path 1745 + k += (r >> 4) & 15; // run 1746 + s = r & 15; // combined length 1747 + j->code_buffer <<= s; 1748 + j->code_bits -= s; 1232 1749 // decode into unzigzag'd location 1233 - data[stbi__jpeg_dezigzag[k++]] = (short) stbi__extend_receive(j,s); 1750 + zig = stbi__jpeg_dezigzag[k++]; 1751 + data[zig] = (short) ((r >> 8) * dequant[zig]); 1752 + } else { 1753 + int rs = stbi__jpeg_huff_decode(j, hac); 1754 + if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG"); 1755 + s = rs & 15; 1756 + r = rs >> 4; 1757 + if (s == 0) { 1758 + if (rs != 0xf0) break; // end block 1759 + k += 16; 1760 + } else { 1761 + k += r; 1762 + // decode into unzigzag'd location 1763 + zig = stbi__jpeg_dezigzag[k++]; 1764 + data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]); 1765 + } 1234 1766 } 1235 1767 } while (k < 64); 1236 1768 return 1; 1237 1769 } 1238 1770 1771 + static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b) 1772 + { 1773 + int diff,dc; 1774 + int t; 1775 + if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG"); 1776 + 1777 + if (j->code_bits < 16) stbi__grow_buffer_unsafe(j); 1778 + 1779 + if (j->succ_high == 0) { 1780 + // first scan for DC coefficient, must be first 1781 + memset(data,0,64*sizeof(data[0])); // 0 all the ac values now 1782 + t = stbi__jpeg_huff_decode(j, hdc); 1783 + diff = t ? stbi__extend_receive(j, t) : 0; 1784 + 1785 + dc = j->img_comp[b].dc_pred + diff; 1786 + j->img_comp[b].dc_pred = dc; 1787 + data[0] = (short) (dc << j->succ_low); 1788 + } else { 1789 + // refinement scan for DC coefficient 1790 + if (stbi__jpeg_get_bit(j)) 1791 + data[0] += (short) (1 << j->succ_low); 1792 + } 1793 + return 1; 1794 + } 1795 + 1796 + // @OPTIMIZE: store non-zigzagged during the decode passes, 1797 + // and only de-zigzag when dequantizing 1798 + static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac) 1799 + { 1800 + int k; 1801 + if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG"); 1802 + 1803 + if (j->succ_high == 0) { 1804 + int shift = j->succ_low; 1805 + 1806 + if (j->eob_run) { 1807 + --j->eob_run; 1808 + return 1; 1809 + } 1810 + 1811 + k = j->spec_start; 1812 + do { 1813 + unsigned int zig; 1814 + int c,r,s; 1815 + if (j->code_bits < 16) stbi__grow_buffer_unsafe(j); 1816 + c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1); 1817 + r = fac[c]; 1818 + if (r) { // fast-AC path 1819 + k += (r >> 4) & 15; // run 1820 + s = r & 15; // combined length 1821 + j->code_buffer <<= s; 1822 + j->code_bits -= s; 1823 + zig = stbi__jpeg_dezigzag[k++]; 1824 + data[zig] = (short) ((r >> 8) << shift); 1825 + } else { 1826 + int rs = stbi__jpeg_huff_decode(j, hac); 1827 + if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG"); 1828 + s = rs & 15; 1829 + r = rs >> 4; 1830 + if (s == 0) { 1831 + if (r < 15) { 1832 + j->eob_run = (1 << r); 1833 + if (r) 1834 + j->eob_run += stbi__jpeg_get_bits(j, r); 1835 + --j->eob_run; 1836 + break; 1837 + } 1838 + k += 16; 1839 + } else { 1840 + k += r; 1841 + zig = stbi__jpeg_dezigzag[k++]; 1842 + data[zig] = (short) (stbi__extend_receive(j,s) << shift); 1843 + } 1844 + } 1845 + } while (k <= j->spec_end); 1846 + } else { 1847 + // refinement scan for these AC coefficients 1848 + 1849 + short bit = (short) (1 << j->succ_low); 1850 + 1851 + if (j->eob_run) { 1852 + --j->eob_run; 1853 + for (k = j->spec_start; k <= j->spec_end; ++k) { 1854 + short *p = &data[stbi__jpeg_dezigzag[k]]; 1855 + if (*p != 0) 1856 + if (stbi__jpeg_get_bit(j)) 1857 + if ((*p & bit)==0) { 1858 + if (*p > 0) 1859 + *p += bit; 1860 + else 1861 + *p -= bit; 1862 + } 1863 + } 1864 + } else { 1865 + k = j->spec_start; 1866 + do { 1867 + int r,s; 1868 + int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh 1869 + if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG"); 1870 + s = rs & 15; 1871 + r = rs >> 4; 1872 + if (s == 0) { 1873 + if (r < 15) { 1874 + j->eob_run = (1 << r) - 1; 1875 + if (r) 1876 + j->eob_run += stbi__jpeg_get_bits(j, r); 1877 + r = 64; // force end of block 1878 + } else { 1879 + // r=15 s=0 should write 16 0s, so we just do 1880 + // a run of 15 0s and then write s (which is 0), 1881 + // so we don't have to do anything special here 1882 + } 1883 + } else { 1884 + if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG"); 1885 + // sign bit 1886 + if (stbi__jpeg_get_bit(j)) 1887 + s = bit; 1888 + else 1889 + s = -bit; 1890 + } 1891 + 1892 + // advance by r 1893 + while (k <= j->spec_end) { 1894 + short *p = &data[stbi__jpeg_dezigzag[k++]]; 1895 + if (*p != 0) { 1896 + if (stbi__jpeg_get_bit(j)) 1897 + if ((*p & bit)==0) { 1898 + if (*p > 0) 1899 + *p += bit; 1900 + else 1901 + *p -= bit; 1902 + } 1903 + } else { 1904 + if (r == 0) { 1905 + *p = (short) s; 1906 + break; 1907 + } 1908 + --r; 1909 + } 1910 + } 1911 + } while (k <= j->spec_end); 1912 + } 1913 + } 1914 + return 1; 1915 + } 1916 + 1239 1917 // take a -128..127 value and stbi__clamp it and convert to 0..255 1240 1918 stbi_inline static stbi_uc stbi__clamp(int x) 1241 1919 { ··· 1247 1925 return (stbi_uc) x; 1248 1926 } 1249 1927 1250 - #define stbi__f2f(x) (int) (((x) * 4096 + 0.5)) 1928 + #define stbi__f2f(x) ((int) (((x) * 4096 + 0.5))) 1251 1929 #define stbi__fsh(x) ((x) << 12) 1252 1930 1253 1931 // derived from jidctint -- DCT_ISLOW 1254 - #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \ 1932 + #define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \ 1255 1933 int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \ 1256 1934 p2 = s2; \ 1257 1935 p3 = s6; \ 1258 - p1 = (p2+p3) * stbi__f2f(0.5411961f); \ 1259 - t2 = p1 + p3*stbi__f2f(-1.847759065f); \ 1260 - t3 = p1 + p2*stbi__f2f( 0.765366865f); \ 1936 + p1 = (p2+p3) * stbi__f2f(0.5411961f); \ 1937 + t2 = p1 + p3*stbi__f2f(-1.847759065f); \ 1938 + t3 = p1 + p2*stbi__f2f( 0.765366865f); \ 1261 1939 p2 = s0; \ 1262 1940 p3 = s4; \ 1263 - t0 = stbi__fsh(p2+p3); \ 1264 - t1 = stbi__fsh(p2-p3); \ 1941 + t0 = stbi__fsh(p2+p3); \ 1942 + t1 = stbi__fsh(p2-p3); \ 1265 1943 x0 = t0+t3; \ 1266 1944 x3 = t0-t3; \ 1267 1945 x1 = t1+t2; \ ··· 1274 1952 p4 = t1+t3; \ 1275 1953 p1 = t0+t3; \ 1276 1954 p2 = t1+t2; \ 1277 - p5 = (p3+p4)*stbi__f2f( 1.175875602f); \ 1278 - t0 = t0*stbi__f2f( 0.298631336f); \ 1279 - t1 = t1*stbi__f2f( 2.053119869f); \ 1280 - t2 = t2*stbi__f2f( 3.072711026f); \ 1281 - t3 = t3*stbi__f2f( 1.501321110f); \ 1282 - p1 = p5 + p1*stbi__f2f(-0.899976223f); \ 1283 - p2 = p5 + p2*stbi__f2f(-2.562915447f); \ 1284 - p3 = p3*stbi__f2f(-1.961570560f); \ 1285 - p4 = p4*stbi__f2f(-0.390180644f); \ 1955 + p5 = (p3+p4)*stbi__f2f( 1.175875602f); \ 1956 + t0 = t0*stbi__f2f( 0.298631336f); \ 1957 + t1 = t1*stbi__f2f( 2.053119869f); \ 1958 + t2 = t2*stbi__f2f( 3.072711026f); \ 1959 + t3 = t3*stbi__f2f( 1.501321110f); \ 1960 + p1 = p5 + p1*stbi__f2f(-0.899976223f); \ 1961 + p2 = p5 + p2*stbi__f2f(-2.562915447f); \ 1962 + p3 = p3*stbi__f2f(-1.961570560f); \ 1963 + p4 = p4*stbi__f2f(-0.390180644f); \ 1286 1964 t3 += p1+p4; \ 1287 1965 t2 += p2+p3; \ 1288 1966 t1 += p2+p4; \ 1289 1967 t0 += p1+p3; 1290 1968 1291 - #ifdef STBI_SIMD 1292 - typedef unsigned short stbi_dequantize_t; 1293 - #else 1294 - typedef stbi_uc stbi_dequantize_t; 1295 - #endif 1296 - 1297 - // .344 seconds on 3*anemones.jpg 1298 - static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64], stbi_dequantize_t *dequantize) 1969 + static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64]) 1299 1970 { 1300 1971 int i,val[64],*v=val; 1301 - stbi_dequantize_t *dq = dequantize; 1302 1972 stbi_uc *o; 1303 1973 short *d = data; 1304 1974 1305 1975 // columns 1306 - for (i=0; i < 8; ++i,++d,++dq, ++v) { 1976 + for (i=0; i < 8; ++i,++d, ++v) { 1307 1977 // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing 1308 1978 if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0 1309 1979 && d[40]==0 && d[48]==0 && d[56]==0) { ··· 1311 1981 // (1|2|3|4|5|6|7)==0 0 seconds 1312 1982 // all separate -0.047 seconds 1313 1983 // 1 && 2|3 && 4|5 && 6|7: -0.047 seconds 1314 - int dcterm = d[0] * dq[0] << 2; 1984 + int dcterm = d[0] << 2; 1315 1985 v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm; 1316 1986 } else { 1317 - STBI__IDCT_1D(d[ 0]*dq[ 0],d[ 8]*dq[ 8],d[16]*dq[16],d[24]*dq[24], 1318 - d[32]*dq[32],d[40]*dq[40],d[48]*dq[48],d[56]*dq[56]) 1987 + STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56]) 1319 1988 // constants scaled things up by 1<<12; let's bring them back 1320 1989 // down, but keep 2 extra bits of precision 1321 1990 x0 += 512; x1 += 512; x2 += 512; x3 += 512; ··· 1356 2025 } 1357 2026 } 1358 2027 1359 - #ifdef STBI_SIMD 1360 - static stbi_idct_8x8 stbi__idct_installed = stbi__idct_block; 2028 + #ifdef STBI_SSE2 2029 + // sse2 integer IDCT. not the fastest possible implementation but it 2030 + // produces bit-identical results to the generic C version so it's 2031 + // fully "transparent". 2032 + static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64]) 2033 + { 2034 + // This is constructed to match our regular (generic) integer IDCT exactly. 2035 + __m128i row0, row1, row2, row3, row4, row5, row6, row7; 2036 + __m128i tmp; 2037 + 2038 + // dot product constant: even elems=x, odd elems=y 2039 + #define dct_const(x,y) _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y)) 2040 + 2041 + // out(0) = c0[even]*x + c0[odd]*y (c0, x, y 16-bit, out 32-bit) 2042 + // out(1) = c1[even]*x + c1[odd]*y 2043 + #define dct_rot(out0,out1, x,y,c0,c1) \ 2044 + __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \ 2045 + __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \ 2046 + __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \ 2047 + __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \ 2048 + __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \ 2049 + __m128i out1##_h = _mm_madd_epi16(c0##hi, c1) 2050 + 2051 + // out = in << 12 (in 16-bit, out 32-bit) 2052 + #define dct_widen(out, in) \ 2053 + __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \ 2054 + __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4) 2055 + 2056 + // wide add 2057 + #define dct_wadd(out, a, b) \ 2058 + __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \ 2059 + __m128i out##_h = _mm_add_epi32(a##_h, b##_h) 2060 + 2061 + // wide sub 2062 + #define dct_wsub(out, a, b) \ 2063 + __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \ 2064 + __m128i out##_h = _mm_sub_epi32(a##_h, b##_h) 2065 + 2066 + // butterfly a/b, add bias, then shift by "s" and pack 2067 + #define dct_bfly32o(out0, out1, a,b,bias,s) \ 2068 + { \ 2069 + __m128i abiased_l = _mm_add_epi32(a##_l, bias); \ 2070 + __m128i abiased_h = _mm_add_epi32(a##_h, bias); \ 2071 + dct_wadd(sum, abiased, b); \ 2072 + dct_wsub(dif, abiased, b); \ 2073 + out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \ 2074 + out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \ 2075 + } 2076 + 2077 + // 8-bit interleave step (for transposes) 2078 + #define dct_interleave8(a, b) \ 2079 + tmp = a; \ 2080 + a = _mm_unpacklo_epi8(a, b); \ 2081 + b = _mm_unpackhi_epi8(tmp, b) 2082 + 2083 + // 16-bit interleave step (for transposes) 2084 + #define dct_interleave16(a, b) \ 2085 + tmp = a; \ 2086 + a = _mm_unpacklo_epi16(a, b); \ 2087 + b = _mm_unpackhi_epi16(tmp, b) 2088 + 2089 + #define dct_pass(bias,shift) \ 2090 + { \ 2091 + /* even part */ \ 2092 + dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \ 2093 + __m128i sum04 = _mm_add_epi16(row0, row4); \ 2094 + __m128i dif04 = _mm_sub_epi16(row0, row4); \ 2095 + dct_widen(t0e, sum04); \ 2096 + dct_widen(t1e, dif04); \ 2097 + dct_wadd(x0, t0e, t3e); \ 2098 + dct_wsub(x3, t0e, t3e); \ 2099 + dct_wadd(x1, t1e, t2e); \ 2100 + dct_wsub(x2, t1e, t2e); \ 2101 + /* odd part */ \ 2102 + dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \ 2103 + dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \ 2104 + __m128i sum17 = _mm_add_epi16(row1, row7); \ 2105 + __m128i sum35 = _mm_add_epi16(row3, row5); \ 2106 + dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \ 2107 + dct_wadd(x4, y0o, y4o); \ 2108 + dct_wadd(x5, y1o, y5o); \ 2109 + dct_wadd(x6, y2o, y5o); \ 2110 + dct_wadd(x7, y3o, y4o); \ 2111 + dct_bfly32o(row0,row7, x0,x7,bias,shift); \ 2112 + dct_bfly32o(row1,row6, x1,x6,bias,shift); \ 2113 + dct_bfly32o(row2,row5, x2,x5,bias,shift); \ 2114 + dct_bfly32o(row3,row4, x3,x4,bias,shift); \ 2115 + } 2116 + 2117 + __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f)); 2118 + __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f)); 2119 + __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f)); 2120 + __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f)); 2121 + __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f)); 2122 + __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f)); 2123 + __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f)); 2124 + __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f)); 2125 + 2126 + // rounding biases in column/row passes, see stbi__idct_block for explanation. 2127 + __m128i bias_0 = _mm_set1_epi32(512); 2128 + __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17)); 2129 + 2130 + // load 2131 + row0 = _mm_load_si128((const __m128i *) (data + 0*8)); 2132 + row1 = _mm_load_si128((const __m128i *) (data + 1*8)); 2133 + row2 = _mm_load_si128((const __m128i *) (data + 2*8)); 2134 + row3 = _mm_load_si128((const __m128i *) (data + 3*8)); 2135 + row4 = _mm_load_si128((const __m128i *) (data + 4*8)); 2136 + row5 = _mm_load_si128((const __m128i *) (data + 5*8)); 2137 + row6 = _mm_load_si128((const __m128i *) (data + 6*8)); 2138 + row7 = _mm_load_si128((const __m128i *) (data + 7*8)); 2139 + 2140 + // column pass 2141 + dct_pass(bias_0, 10); 2142 + 2143 + { 2144 + // 16bit 8x8 transpose pass 1 2145 + dct_interleave16(row0, row4); 2146 + dct_interleave16(row1, row5); 2147 + dct_interleave16(row2, row6); 2148 + dct_interleave16(row3, row7); 2149 + 2150 + // transpose pass 2 2151 + dct_interleave16(row0, row2); 2152 + dct_interleave16(row1, row3); 2153 + dct_interleave16(row4, row6); 2154 + dct_interleave16(row5, row7); 2155 + 2156 + // transpose pass 3 2157 + dct_interleave16(row0, row1); 2158 + dct_interleave16(row2, row3); 2159 + dct_interleave16(row4, row5); 2160 + dct_interleave16(row6, row7); 2161 + } 1361 2162 1362 - STBIDEF void stbi_install_idct(stbi_idct_8x8 func) 2163 + // row pass 2164 + dct_pass(bias_1, 17); 2165 + 2166 + { 2167 + // pack 2168 + __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7 2169 + __m128i p1 = _mm_packus_epi16(row2, row3); 2170 + __m128i p2 = _mm_packus_epi16(row4, row5); 2171 + __m128i p3 = _mm_packus_epi16(row6, row7); 2172 + 2173 + // 8bit 8x8 transpose pass 1 2174 + dct_interleave8(p0, p2); // a0e0a1e1... 2175 + dct_interleave8(p1, p3); // c0g0c1g1... 2176 + 2177 + // transpose pass 2 2178 + dct_interleave8(p0, p1); // a0c0e0g0... 2179 + dct_interleave8(p2, p3); // b0d0f0h0... 2180 + 2181 + // transpose pass 3 2182 + dct_interleave8(p0, p2); // a0b0c0d0... 2183 + dct_interleave8(p1, p3); // a4b4c4d4... 2184 + 2185 + // store 2186 + _mm_storel_epi64((__m128i *) out, p0); out += out_stride; 2187 + _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride; 2188 + _mm_storel_epi64((__m128i *) out, p2); out += out_stride; 2189 + _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride; 2190 + _mm_storel_epi64((__m128i *) out, p1); out += out_stride; 2191 + _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride; 2192 + _mm_storel_epi64((__m128i *) out, p3); out += out_stride; 2193 + _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e)); 2194 + } 2195 + 2196 + #undef dct_const 2197 + #undef dct_rot 2198 + #undef dct_widen 2199 + #undef dct_wadd 2200 + #undef dct_wsub 2201 + #undef dct_bfly32o 2202 + #undef dct_interleave8 2203 + #undef dct_interleave16 2204 + #undef dct_pass 2205 + } 2206 + 2207 + #endif // STBI_SSE2 2208 + 2209 + #ifdef STBI_NEON 2210 + 2211 + // NEON integer IDCT. should produce bit-identical 2212 + // results to the generic C version. 2213 + static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64]) 1363 2214 { 1364 - stbi__idct_installed = func; 2215 + int16x8_t row0, row1, row2, row3, row4, row5, row6, row7; 2216 + 2217 + int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f)); 2218 + int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f)); 2219 + int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f)); 2220 + int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f)); 2221 + int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f)); 2222 + int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f)); 2223 + int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f)); 2224 + int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f)); 2225 + int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f)); 2226 + int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f)); 2227 + int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f)); 2228 + int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f)); 2229 + 2230 + #define dct_long_mul(out, inq, coeff) \ 2231 + int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \ 2232 + int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff) 2233 + 2234 + #define dct_long_mac(out, acc, inq, coeff) \ 2235 + int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \ 2236 + int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff) 2237 + 2238 + #define dct_widen(out, inq) \ 2239 + int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \ 2240 + int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12) 2241 + 2242 + // wide add 2243 + #define dct_wadd(out, a, b) \ 2244 + int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \ 2245 + int32x4_t out##_h = vaddq_s32(a##_h, b##_h) 2246 + 2247 + // wide sub 2248 + #define dct_wsub(out, a, b) \ 2249 + int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \ 2250 + int32x4_t out##_h = vsubq_s32(a##_h, b##_h) 2251 + 2252 + // butterfly a/b, then shift using "shiftop" by "s" and pack 2253 + #define dct_bfly32o(out0,out1, a,b,shiftop,s) \ 2254 + { \ 2255 + dct_wadd(sum, a, b); \ 2256 + dct_wsub(dif, a, b); \ 2257 + out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \ 2258 + out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \ 2259 + } 2260 + 2261 + #define dct_pass(shiftop, shift) \ 2262 + { \ 2263 + /* even part */ \ 2264 + int16x8_t sum26 = vaddq_s16(row2, row6); \ 2265 + dct_long_mul(p1e, sum26, rot0_0); \ 2266 + dct_long_mac(t2e, p1e, row6, rot0_1); \ 2267 + dct_long_mac(t3e, p1e, row2, rot0_2); \ 2268 + int16x8_t sum04 = vaddq_s16(row0, row4); \ 2269 + int16x8_t dif04 = vsubq_s16(row0, row4); \ 2270 + dct_widen(t0e, sum04); \ 2271 + dct_widen(t1e, dif04); \ 2272 + dct_wadd(x0, t0e, t3e); \ 2273 + dct_wsub(x3, t0e, t3e); \ 2274 + dct_wadd(x1, t1e, t2e); \ 2275 + dct_wsub(x2, t1e, t2e); \ 2276 + /* odd part */ \ 2277 + int16x8_t sum15 = vaddq_s16(row1, row5); \ 2278 + int16x8_t sum17 = vaddq_s16(row1, row7); \ 2279 + int16x8_t sum35 = vaddq_s16(row3, row5); \ 2280 + int16x8_t sum37 = vaddq_s16(row3, row7); \ 2281 + int16x8_t sumodd = vaddq_s16(sum17, sum35); \ 2282 + dct_long_mul(p5o, sumodd, rot1_0); \ 2283 + dct_long_mac(p1o, p5o, sum17, rot1_1); \ 2284 + dct_long_mac(p2o, p5o, sum35, rot1_2); \ 2285 + dct_long_mul(p3o, sum37, rot2_0); \ 2286 + dct_long_mul(p4o, sum15, rot2_1); \ 2287 + dct_wadd(sump13o, p1o, p3o); \ 2288 + dct_wadd(sump24o, p2o, p4o); \ 2289 + dct_wadd(sump23o, p2o, p3o); \ 2290 + dct_wadd(sump14o, p1o, p4o); \ 2291 + dct_long_mac(x4, sump13o, row7, rot3_0); \ 2292 + dct_long_mac(x5, sump24o, row5, rot3_1); \ 2293 + dct_long_mac(x6, sump23o, row3, rot3_2); \ 2294 + dct_long_mac(x7, sump14o, row1, rot3_3); \ 2295 + dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \ 2296 + dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \ 2297 + dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \ 2298 + dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \ 2299 + } 2300 + 2301 + // load 2302 + row0 = vld1q_s16(data + 0*8); 2303 + row1 = vld1q_s16(data + 1*8); 2304 + row2 = vld1q_s16(data + 2*8); 2305 + row3 = vld1q_s16(data + 3*8); 2306 + row4 = vld1q_s16(data + 4*8); 2307 + row5 = vld1q_s16(data + 5*8); 2308 + row6 = vld1q_s16(data + 6*8); 2309 + row7 = vld1q_s16(data + 7*8); 2310 + 2311 + // add DC bias 2312 + row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0)); 2313 + 2314 + // column pass 2315 + dct_pass(vrshrn_n_s32, 10); 2316 + 2317 + // 16bit 8x8 transpose 2318 + { 2319 + // these three map to a single VTRN.16, VTRN.32, and VSWP, respectively. 2320 + // whether compilers actually get this is another story, sadly. 2321 + #define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; } 2322 + #define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); } 2323 + #define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); } 2324 + 2325 + // pass 1 2326 + dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6 2327 + dct_trn16(row2, row3); 2328 + dct_trn16(row4, row5); 2329 + dct_trn16(row6, row7); 2330 + 2331 + // pass 2 2332 + dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4 2333 + dct_trn32(row1, row3); 2334 + dct_trn32(row4, row6); 2335 + dct_trn32(row5, row7); 2336 + 2337 + // pass 3 2338 + dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0 2339 + dct_trn64(row1, row5); 2340 + dct_trn64(row2, row6); 2341 + dct_trn64(row3, row7); 2342 + 2343 + #undef dct_trn16 2344 + #undef dct_trn32 2345 + #undef dct_trn64 2346 + } 2347 + 2348 + // row pass 2349 + // vrshrn_n_s32 only supports shifts up to 16, we need 2350 + // 17. so do a non-rounding shift of 16 first then follow 2351 + // up with a rounding shift by 1. 2352 + dct_pass(vshrn_n_s32, 16); 2353 + 2354 + { 2355 + // pack and round 2356 + uint8x8_t p0 = vqrshrun_n_s16(row0, 1); 2357 + uint8x8_t p1 = vqrshrun_n_s16(row1, 1); 2358 + uint8x8_t p2 = vqrshrun_n_s16(row2, 1); 2359 + uint8x8_t p3 = vqrshrun_n_s16(row3, 1); 2360 + uint8x8_t p4 = vqrshrun_n_s16(row4, 1); 2361 + uint8x8_t p5 = vqrshrun_n_s16(row5, 1); 2362 + uint8x8_t p6 = vqrshrun_n_s16(row6, 1); 2363 + uint8x8_t p7 = vqrshrun_n_s16(row7, 1); 2364 + 2365 + // again, these can translate into one instruction, but often don't. 2366 + #define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; } 2367 + #define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); } 2368 + #define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); } 2369 + 2370 + // sadly can't use interleaved stores here since we only write 2371 + // 8 bytes to each scan line! 2372 + 2373 + // 8x8 8-bit transpose pass 1 2374 + dct_trn8_8(p0, p1); 2375 + dct_trn8_8(p2, p3); 2376 + dct_trn8_8(p4, p5); 2377 + dct_trn8_8(p6, p7); 2378 + 2379 + // pass 2 2380 + dct_trn8_16(p0, p2); 2381 + dct_trn8_16(p1, p3); 2382 + dct_trn8_16(p4, p6); 2383 + dct_trn8_16(p5, p7); 2384 + 2385 + // pass 3 2386 + dct_trn8_32(p0, p4); 2387 + dct_trn8_32(p1, p5); 2388 + dct_trn8_32(p2, p6); 2389 + dct_trn8_32(p3, p7); 2390 + 2391 + // store 2392 + vst1_u8(out, p0); out += out_stride; 2393 + vst1_u8(out, p1); out += out_stride; 2394 + vst1_u8(out, p2); out += out_stride; 2395 + vst1_u8(out, p3); out += out_stride; 2396 + vst1_u8(out, p4); out += out_stride; 2397 + vst1_u8(out, p5); out += out_stride; 2398 + vst1_u8(out, p6); out += out_stride; 2399 + vst1_u8(out, p7); 2400 + 2401 + #undef dct_trn8_8 2402 + #undef dct_trn8_16 2403 + #undef dct_trn8_32 2404 + } 2405 + 2406 + #undef dct_long_mul 2407 + #undef dct_long_mac 2408 + #undef dct_widen 2409 + #undef dct_wadd 2410 + #undef dct_wsub 2411 + #undef dct_bfly32o 2412 + #undef dct_pass 1365 2413 } 1366 - #endif 2414 + 2415 + #endif // STBI_NEON 1367 2416 1368 2417 #define STBI__MARKER_none 0xff 1369 2418 // if there's a pending marker from the entropy stream, return that ··· 1394 2443 j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = 0; 1395 2444 j->marker = STBI__MARKER_none; 1396 2445 j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff; 2446 + j->eob_run = 0; 1397 2447 // no more than 1<<31 MCUs if no restart_interal? that's plenty safe, 1398 2448 // since we don't even allow 1<<30 pixels 1399 2449 } ··· 1401 2451 static int stbi__parse_entropy_coded_data(stbi__jpeg *z) 1402 2452 { 1403 2453 stbi__jpeg_reset(z); 1404 - if (z->scan_n == 1) { 1405 - int i,j; 1406 - #ifdef STBI_SIMD 1407 - __declspec(align(16)) 1408 - #endif 1409 - short data[64]; 1410 - int n = z->order[0]; 1411 - // non-interleaved data, we just need to process one block at a time, 1412 - // in trivial scanline order 1413 - // number of blocks to do just depends on how many actual "pixels" this 1414 - // component has, independent of interleaved MCU blocking and such 1415 - int w = (z->img_comp[n].x+7) >> 3; 1416 - int h = (z->img_comp[n].y+7) >> 3; 1417 - for (j=0; j < h; ++j) { 1418 - for (i=0; i < w; ++i) { 1419 - if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+z->img_comp[n].ha, n)) return 0; 1420 - #ifdef STBI_SIMD 1421 - stbi__idct_installed(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data, z->dequant2[z->img_comp[n].tq]); 1422 - #else 1423 - stbi__idct_block(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data, z->dequant[z->img_comp[n].tq]); 1424 - #endif 1425 - // every data block is an MCU, so countdown the restart interval 1426 - if (--z->todo <= 0) { 1427 - if (z->code_bits < 24) stbi__grow_buffer_unsafe(z); 1428 - // if it's NOT a restart, then just bail, so we get corrupt data 1429 - // rather than no data 1430 - if (!STBI__RESTART(z->marker)) return 1; 1431 - stbi__jpeg_reset(z); 2454 + if (!z->progressive) { 2455 + if (z->scan_n == 1) { 2456 + int i,j; 2457 + STBI_SIMD_ALIGN(short, data[64]); 2458 + int n = z->order[0]; 2459 + // non-interleaved data, we just need to process one block at a time, 2460 + // in trivial scanline order 2461 + // number of blocks to do just depends on how many actual "pixels" this 2462 + // component has, independent of interleaved MCU blocking and such 2463 + int w = (z->img_comp[n].x+7) >> 3; 2464 + int h = (z->img_comp[n].y+7) >> 3; 2465 + for (j=0; j < h; ++j) { 2466 + for (i=0; i < w; ++i) { 2467 + int ha = z->img_comp[n].ha; 2468 + if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0; 2469 + z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data); 2470 + // every data block is an MCU, so countdown the restart interval 2471 + if (--z->todo <= 0) { 2472 + if (z->code_bits < 24) stbi__grow_buffer_unsafe(z); 2473 + // if it's NOT a restart, then just bail, so we get corrupt data 2474 + // rather than no data 2475 + if (!STBI__RESTART(z->marker)) return 1; 2476 + stbi__jpeg_reset(z); 2477 + } 1432 2478 } 1433 2479 } 2480 + return 1; 2481 + } else { // interleaved 2482 + int i,j,k,x,y; 2483 + STBI_SIMD_ALIGN(short, data[64]); 2484 + for (j=0; j < z->img_mcu_y; ++j) { 2485 + for (i=0; i < z->img_mcu_x; ++i) { 2486 + // scan an interleaved mcu... process scan_n components in order 2487 + for (k=0; k < z->scan_n; ++k) { 2488 + int n = z->order[k]; 2489 + // scan out an mcu's worth of this component; that's just determined 2490 + // by the basic H and V specified for the component 2491 + for (y=0; y < z->img_comp[n].v; ++y) { 2492 + for (x=0; x < z->img_comp[n].h; ++x) { 2493 + int x2 = (i*z->img_comp[n].h + x)*8; 2494 + int y2 = (j*z->img_comp[n].v + y)*8; 2495 + int ha = z->img_comp[n].ha; 2496 + if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0; 2497 + z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data); 2498 + } 2499 + } 2500 + } 2501 + // after all interleaved components, that's an interleaved MCU, 2502 + // so now count down the restart interval 2503 + if (--z->todo <= 0) { 2504 + if (z->code_bits < 24) stbi__grow_buffer_unsafe(z); 2505 + if (!STBI__RESTART(z->marker)) return 1; 2506 + stbi__jpeg_reset(z); 2507 + } 2508 + } 2509 + } 2510 + return 1; 1434 2511 } 1435 - } else { // interleaved! 1436 - int i,j,k,x,y; 1437 - short data[64]; 1438 - for (j=0; j < z->img_mcu_y; ++j) { 1439 - for (i=0; i < z->img_mcu_x; ++i) { 1440 - // scan an interleaved mcu... process scan_n components in order 1441 - for (k=0; k < z->scan_n; ++k) { 1442 - int n = z->order[k]; 1443 - // scan out an mcu's worth of this component; that's just determined 1444 - // by the basic H and V specified for the component 1445 - for (y=0; y < z->img_comp[n].v; ++y) { 1446 - for (x=0; x < z->img_comp[n].h; ++x) { 1447 - int x2 = (i*z->img_comp[n].h + x)*8; 1448 - int y2 = (j*z->img_comp[n].v + y)*8; 1449 - if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+z->img_comp[n].ha, n)) return 0; 1450 - #ifdef STBI_SIMD 1451 - stbi__idct_installed(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data, z->dequant2[z->img_comp[n].tq]); 1452 - #else 1453 - stbi__idct_block(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data, z->dequant[z->img_comp[n].tq]); 1454 - #endif 2512 + } else { 2513 + if (z->scan_n == 1) { 2514 + int i,j; 2515 + int n = z->order[0]; 2516 + // non-interleaved data, we just need to process one block at a time, 2517 + // in trivial scanline order 2518 + // number of blocks to do just depends on how many actual "pixels" this 2519 + // component has, independent of interleaved MCU blocking and such 2520 + int w = (z->img_comp[n].x+7) >> 3; 2521 + int h = (z->img_comp[n].y+7) >> 3; 2522 + for (j=0; j < h; ++j) { 2523 + for (i=0; i < w; ++i) { 2524 + short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w); 2525 + if (z->spec_start == 0) { 2526 + if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n)) 2527 + return 0; 2528 + } else { 2529 + int ha = z->img_comp[n].ha; 2530 + if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha])) 2531 + return 0; 2532 + } 2533 + // every data block is an MCU, so countdown the restart interval 2534 + if (--z->todo <= 0) { 2535 + if (z->code_bits < 24) stbi__grow_buffer_unsafe(z); 2536 + if (!STBI__RESTART(z->marker)) return 1; 2537 + stbi__jpeg_reset(z); 2538 + } 2539 + } 2540 + } 2541 + return 1; 2542 + } else { // interleaved 2543 + int i,j,k,x,y; 2544 + for (j=0; j < z->img_mcu_y; ++j) { 2545 + for (i=0; i < z->img_mcu_x; ++i) { 2546 + // scan an interleaved mcu... process scan_n components in order 2547 + for (k=0; k < z->scan_n; ++k) { 2548 + int n = z->order[k]; 2549 + // scan out an mcu's worth of this component; that's just determined 2550 + // by the basic H and V specified for the component 2551 + for (y=0; y < z->img_comp[n].v; ++y) { 2552 + for (x=0; x < z->img_comp[n].h; ++x) { 2553 + int x2 = (i*z->img_comp[n].h + x); 2554 + int y2 = (j*z->img_comp[n].v + y); 2555 + short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w); 2556 + if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n)) 2557 + return 0; 2558 + } 1455 2559 } 1456 2560 } 2561 + // after all interleaved components, that's an interleaved MCU, 2562 + // so now count down the restart interval 2563 + if (--z->todo <= 0) { 2564 + if (z->code_bits < 24) stbi__grow_buffer_unsafe(z); 2565 + if (!STBI__RESTART(z->marker)) return 1; 2566 + stbi__jpeg_reset(z); 2567 + } 1457 2568 } 1458 - // after all interleaved components, that's an interleaved MCU, 1459 - // so now count down the restart interval 1460 - if (--z->todo <= 0) { 1461 - if (z->code_bits < 24) stbi__grow_buffer_unsafe(z); 1462 - // if it's NOT a restart, then just bail, so we get corrupt data 1463 - // rather than no data 1464 - if (!STBI__RESTART(z->marker)) return 1; 1465 - stbi__jpeg_reset(z); 2569 + } 2570 + return 1; 2571 + } 2572 + } 2573 + } 2574 + 2575 + static void stbi__jpeg_dequantize(short *data, stbi_uc *dequant) 2576 + { 2577 + int i; 2578 + for (i=0; i < 64; ++i) 2579 + data[i] *= dequant[i]; 2580 + } 2581 + 2582 + static void stbi__jpeg_finish(stbi__jpeg *z) 2583 + { 2584 + if (z->progressive) { 2585 + // dequantize and idct the data 2586 + int i,j,n; 2587 + for (n=0; n < z->s->img_n; ++n) { 2588 + int w = (z->img_comp[n].x+7) >> 3; 2589 + int h = (z->img_comp[n].y+7) >> 3; 2590 + for (j=0; j < h; ++j) { 2591 + for (i=0; i < w; ++i) { 2592 + short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w); 2593 + stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]); 2594 + z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data); 1466 2595 } 1467 2596 } 1468 2597 } 1469 2598 } 1470 - return 1; 1471 2599 } 1472 2600 1473 2601 static int stbi__process_marker(stbi__jpeg *z, int m) ··· 1477 2605 case STBI__MARKER_none: // no marker found 1478 2606 return stbi__err("expected marker","Corrupt JPEG"); 1479 2607 1480 - case 0xC2: // stbi__SOF - progressive 1481 - return stbi__err("progressive jpeg","JPEG format not supported (progressive)"); 1482 - 1483 2608 case 0xDD: // DRI - specify restart interval 1484 2609 if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG"); 1485 2610 z->restart_interval = stbi__get16be(z->s); ··· 1495 2620 if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG"); 1496 2621 for (i=0; i < 64; ++i) 1497 2622 z->dequant[t][stbi__jpeg_dezigzag[i]] = stbi__get8(z->s); 1498 - #ifdef STBI_SIMD 1499 - for (i=0; i < 64; ++i) 1500 - z->dequant2[t][i] = z->dequant[t][i]; 1501 - #endif 1502 2623 L -= 65; 1503 2624 } 1504 2625 return L==0; ··· 1526 2647 } 1527 2648 for (i=0; i < n; ++i) 1528 2649 v[i] = stbi__get8(z->s); 2650 + if (tc != 0) 2651 + stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th); 1529 2652 L -= n; 1530 2653 } 1531 2654 return L==0; ··· 1538 2661 return 0; 1539 2662 } 1540 2663 1541 - // after we see stbi__SOS 2664 + // after we see SOS 1542 2665 static int stbi__process_scan_header(stbi__jpeg *z) 1543 2666 { 1544 2667 int i; 1545 2668 int Ls = stbi__get16be(z->s); 1546 2669 z->scan_n = stbi__get8(z->s); 1547 - if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad stbi__SOS component count","Corrupt JPEG"); 1548 - if (Ls != 6+2*z->scan_n) return stbi__err("bad stbi__SOS len","Corrupt JPEG"); 2670 + if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG"); 2671 + if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG"); 1549 2672 for (i=0; i < z->scan_n; ++i) { 1550 2673 int id = stbi__get8(z->s), which; 1551 2674 int q = stbi__get8(z->s); 1552 2675 for (which = 0; which < z->s->img_n; ++which) 1553 2676 if (z->img_comp[which].id == id) 1554 2677 break; 1555 - if (which == z->s->img_n) return 0; 2678 + if (which == z->s->img_n) return 0; // no match 1556 2679 z->img_comp[which].hd = q >> 4; if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG"); 1557 2680 z->img_comp[which].ha = q & 15; if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG"); 1558 2681 z->order[i] = which; 1559 2682 } 1560 - if (stbi__get8(z->s) != 0) return stbi__err("bad stbi__SOS","Corrupt JPEG"); 1561 - stbi__get8(z->s); // should be 63, but might be 0 1562 - if (stbi__get8(z->s) != 0) return stbi__err("bad stbi__SOS","Corrupt JPEG"); 2683 + 2684 + { 2685 + int aa; 2686 + z->spec_start = stbi__get8(z->s); 2687 + z->spec_end = stbi__get8(z->s); // should be 63, but might be 0 2688 + aa = stbi__get8(z->s); 2689 + z->succ_high = (aa >> 4); 2690 + z->succ_low = (aa & 15); 2691 + if (z->progressive) { 2692 + if (z->spec_start > 63 || z->spec_end > 63 || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13) 2693 + return stbi__err("bad SOS", "Corrupt JPEG"); 2694 + } else { 2695 + if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG"); 2696 + if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG"); 2697 + z->spec_end = 63; 2698 + } 2699 + } 1563 2700 1564 2701 return 1; 1565 2702 } ··· 1568 2705 { 1569 2706 stbi__context *s = z->s; 1570 2707 int Lf,p,i,q, h_max=1,v_max=1,c; 1571 - Lf = stbi__get16be(s); if (Lf < 11) return stbi__err("bad stbi__SOF len","Corrupt JPEG"); // JPEG 1572 - p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline 2708 + Lf = stbi__get16be(s); if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG 2709 + p = stbi__get8(s); if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline 1573 2710 s->img_y = stbi__get16be(s); if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG 1574 2711 s->img_x = stbi__get16be(s); if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires 1575 2712 c = stbi__get8(s); ··· 1580 2717 z->img_comp[i].linebuf = NULL; 1581 2718 } 1582 2719 1583 - if (Lf != 8+3*s->img_n) return stbi__err("bad stbi__SOF len","Corrupt JPEG"); 2720 + if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG"); 1584 2721 1585 2722 for (i=0; i < s->img_n; ++i) { 1586 2723 z->img_comp[i].id = stbi__get8(s); ··· 1593 2730 z->img_comp[i].tq = stbi__get8(s); if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG"); 1594 2731 } 1595 2732 1596 - if (scan != SCAN_load) return 1; 2733 + if (scan != STBI__SCAN_load) return 1; 1597 2734 1598 2735 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode"); 1599 2736 ··· 1611 2748 z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h; 1612 2749 1613 2750 for (i=0; i < s->img_n; ++i) { 1614 - // number of effective pixels (stbi__err.g. for non-interleaved MCU) 2751 + // number of effective pixels (e.g. for non-interleaved MCU) 1615 2752 z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max; 1616 2753 z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max; 1617 2754 // to simplify generation, we'll allocate enough memory to decode 1618 2755 // the bogus oversized data from using interleaved MCUs and their 1619 - // big blocks (stbi__err.g. a 16x16 iMCU on an image of width 33); we won't 2756 + // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't 1620 2757 // discard the extra data until colorspace conversion 1621 2758 z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8; 1622 2759 z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8; 1623 2760 z->img_comp[i].raw_data = stbi__malloc(z->img_comp[i].w2 * z->img_comp[i].h2+15); 2761 + 1624 2762 if (z->img_comp[i].raw_data == NULL) { 1625 2763 for(--i; i >= 0; --i) { 1626 - free(z->img_comp[i].raw_data); 1627 - z->img_comp[i].data = NULL; 2764 + STBI_FREE(z->img_comp[i].raw_data); 2765 + z->img_comp[i].raw_data = NULL; 1628 2766 } 1629 2767 return stbi__err("outofmem", "Out of memory"); 1630 2768 } 1631 - // align blocks for installable-idct using mmx/sse 2769 + // align blocks for idct using mmx/sse 1632 2770 z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15); 1633 2771 z->img_comp[i].linebuf = NULL; 2772 + if (z->progressive) { 2773 + z->img_comp[i].coeff_w = (z->img_comp[i].w2 + 7) >> 3; 2774 + z->img_comp[i].coeff_h = (z->img_comp[i].h2 + 7) >> 3; 2775 + z->img_comp[i].raw_coeff = STBI_MALLOC(z->img_comp[i].coeff_w * z->img_comp[i].coeff_h * 64 * sizeof(short) + 15); 2776 + z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15); 2777 + } else { 2778 + z->img_comp[i].coeff = 0; 2779 + z->img_comp[i].raw_coeff = 0; 2780 + } 1634 2781 } 1635 2782 1636 2783 return 1; 1637 2784 } 1638 2785 1639 - // use comparisons since in some cases we handle more than one case (stbi__err.g. stbi__SOF) 2786 + // use comparisons since in some cases we handle more than one case (e.g. SOF) 1640 2787 #define stbi__DNL(x) ((x) == 0xdc) 1641 2788 #define stbi__SOI(x) ((x) == 0xd8) 1642 2789 #define stbi__EOI(x) ((x) == 0xd9) 1643 - #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1) 2790 + #define stbi__SOF(x) ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2) 1644 2791 #define stbi__SOS(x) ((x) == 0xda) 1645 2792 1646 - static int decode_jpeg_header(stbi__jpeg *z, int scan) 2793 + #define stbi__SOF_progressive(x) ((x) == 0xc2) 2794 + 2795 + static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan) 1647 2796 { 1648 2797 int m; 1649 2798 z->marker = STBI__MARKER_none; // initialize cached marker to empty 1650 2799 m = stbi__get_marker(z); 1651 - if (!stbi__SOI(m)) return stbi__err("no stbi__SOI","Corrupt JPEG"); 1652 - if (scan == SCAN_type) return 1; 2800 + if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG"); 2801 + if (scan == STBI__SCAN_type) return 1; 1653 2802 m = stbi__get_marker(z); 1654 2803 while (!stbi__SOF(m)) { 1655 2804 if (!stbi__process_marker(z,m)) return 0; 1656 2805 m = stbi__get_marker(z); 1657 2806 while (m == STBI__MARKER_none) { 1658 2807 // some files have extra padding after their blocks, so ok, we'll scan 1659 - if (stbi__at_eof(z->s)) return stbi__err("no stbi__SOF", "Corrupt JPEG"); 2808 + if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG"); 1660 2809 m = stbi__get_marker(z); 1661 2810 } 1662 2811 } 2812 + z->progressive = stbi__SOF_progressive(m); 1663 2813 if (!stbi__process_frame_header(z, scan)) return 0; 1664 2814 return 1; 1665 2815 } 1666 2816 1667 - static int decode_jpeg_image(stbi__jpeg *j) 2817 + // decode image to YCbCr format 2818 + static int stbi__decode_jpeg_image(stbi__jpeg *j) 1668 2819 { 1669 2820 int m; 2821 + for (m = 0; m < 4; m++) { 2822 + j->img_comp[m].raw_data = NULL; 2823 + j->img_comp[m].raw_coeff = NULL; 2824 + } 1670 2825 j->restart_interval = 0; 1671 - if (!decode_jpeg_header(j, SCAN_load)) return 0; 2826 + if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0; 1672 2827 m = stbi__get_marker(j); 1673 2828 while (!stbi__EOI(m)) { 1674 2829 if (stbi__SOS(m)) { ··· 1682 2837 j->marker = stbi__get8(j->s); 1683 2838 break; 1684 2839 } else if (x != 0) { 1685 - return 0; 2840 + return stbi__err("junk before marker", "Corrupt JPEG"); 1686 2841 } 1687 2842 } 1688 2843 // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0 ··· 1692 2847 } 1693 2848 m = stbi__get_marker(j); 1694 2849 } 2850 + if (j->progressive) 2851 + stbi__jpeg_finish(j); 1695 2852 return 1; 1696 2853 } 1697 2854 ··· 1775 2932 return out; 1776 2933 } 1777 2934 2935 + #if defined(STBI_SSE2) || defined(STBI_NEON) 2936 + static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) 2937 + { 2938 + // need to generate 2x2 samples for every one in input 2939 + int i=0,t0,t1; 2940 + 2941 + if (w == 1) { 2942 + out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2); 2943 + return out; 2944 + } 2945 + 2946 + t1 = 3*in_near[0] + in_far[0]; 2947 + // process groups of 8 pixels for as long as we can. 2948 + // note we can't handle the last pixel in a row in this loop 2949 + // because we need to handle the filter boundary conditions. 2950 + for (; i < ((w-1) & ~7); i += 8) { 2951 + #if defined(STBI_SSE2) 2952 + // load and perform the vertical filtering pass 2953 + // this uses 3*x + y = 4*x + (y - x) 2954 + __m128i zero = _mm_setzero_si128(); 2955 + __m128i farb = _mm_loadl_epi64((__m128i *) (in_far + i)); 2956 + __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i)); 2957 + __m128i farw = _mm_unpacklo_epi8(farb, zero); 2958 + __m128i nearw = _mm_unpacklo_epi8(nearb, zero); 2959 + __m128i diff = _mm_sub_epi16(farw, nearw); 2960 + __m128i nears = _mm_slli_epi16(nearw, 2); 2961 + __m128i curr = _mm_add_epi16(nears, diff); // current row 2962 + 2963 + // horizontal filter works the same based on shifted vers of current 2964 + // row. "prev" is current row shifted right by 1 pixel; we need to 2965 + // insert the previous pixel value (from t1). 2966 + // "next" is current row shifted left by 1 pixel, with first pixel 2967 + // of next block of 8 pixels added in. 2968 + __m128i prv0 = _mm_slli_si128(curr, 2); 2969 + __m128i nxt0 = _mm_srli_si128(curr, 2); 2970 + __m128i prev = _mm_insert_epi16(prv0, t1, 0); 2971 + __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7); 2972 + 2973 + // horizontal filter, polyphase implementation since it's convenient: 2974 + // even pixels = 3*cur + prev = cur*4 + (prev - cur) 2975 + // odd pixels = 3*cur + next = cur*4 + (next - cur) 2976 + // note the shared term. 2977 + __m128i bias = _mm_set1_epi16(8); 2978 + __m128i curs = _mm_slli_epi16(curr, 2); 2979 + __m128i prvd = _mm_sub_epi16(prev, curr); 2980 + __m128i nxtd = _mm_sub_epi16(next, curr); 2981 + __m128i curb = _mm_add_epi16(curs, bias); 2982 + __m128i even = _mm_add_epi16(prvd, curb); 2983 + __m128i odd = _mm_add_epi16(nxtd, curb); 2984 + 2985 + // interleave even and odd pixels, then undo scaling. 2986 + __m128i int0 = _mm_unpacklo_epi16(even, odd); 2987 + __m128i int1 = _mm_unpackhi_epi16(even, odd); 2988 + __m128i de0 = _mm_srli_epi16(int0, 4); 2989 + __m128i de1 = _mm_srli_epi16(int1, 4); 2990 + 2991 + // pack and write output 2992 + __m128i outv = _mm_packus_epi16(de0, de1); 2993 + _mm_storeu_si128((__m128i *) (out + i*2), outv); 2994 + #elif defined(STBI_NEON) 2995 + // load and perform the vertical filtering pass 2996 + // this uses 3*x + y = 4*x + (y - x) 2997 + uint8x8_t farb = vld1_u8(in_far + i); 2998 + uint8x8_t nearb = vld1_u8(in_near + i); 2999 + int16x8_t diff = vreinterpretq_s16_u16(vsubl_u8(farb, nearb)); 3000 + int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2)); 3001 + int16x8_t curr = vaddq_s16(nears, diff); // current row 3002 + 3003 + // horizontal filter works the same based on shifted vers of current 3004 + // row. "prev" is current row shifted right by 1 pixel; we need to 3005 + // insert the previous pixel value (from t1). 3006 + // "next" is current row shifted left by 1 pixel, with first pixel 3007 + // of next block of 8 pixels added in. 3008 + int16x8_t prv0 = vextq_s16(curr, curr, 7); 3009 + int16x8_t nxt0 = vextq_s16(curr, curr, 1); 3010 + int16x8_t prev = vsetq_lane_s16(t1, prv0, 0); 3011 + int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7); 3012 + 3013 + // horizontal filter, polyphase implementation since it's convenient: 3014 + // even pixels = 3*cur + prev = cur*4 + (prev - cur) 3015 + // odd pixels = 3*cur + next = cur*4 + (next - cur) 3016 + // note the shared term. 3017 + int16x8_t curs = vshlq_n_s16(curr, 2); 3018 + int16x8_t prvd = vsubq_s16(prev, curr); 3019 + int16x8_t nxtd = vsubq_s16(next, curr); 3020 + int16x8_t even = vaddq_s16(curs, prvd); 3021 + int16x8_t odd = vaddq_s16(curs, nxtd); 3022 + 3023 + // undo scaling and round, then store with even/odd phases interleaved 3024 + uint8x8x2_t o; 3025 + o.val[0] = vqrshrun_n_s16(even, 4); 3026 + o.val[1] = vqrshrun_n_s16(odd, 4); 3027 + vst2_u8(out + i*2, o); 3028 + #endif 3029 + 3030 + // "previous" value for next iter 3031 + t1 = 3*in_near[i+7] + in_far[i+7]; 3032 + } 3033 + 3034 + t0 = t1; 3035 + t1 = 3*in_near[i] + in_far[i]; 3036 + out[i*2] = stbi__div16(3*t1 + t0 + 8); 3037 + 3038 + for (++i; i < w; ++i) { 3039 + t0 = t1; 3040 + t1 = 3*in_near[i]+in_far[i]; 3041 + out[i*2-1] = stbi__div16(3*t0 + t1 + 8); 3042 + out[i*2 ] = stbi__div16(3*t1 + t0 + 8); 3043 + } 3044 + out[w*2-1] = stbi__div4(t1+2); 3045 + 3046 + STBI_NOTUSED(hs); 3047 + 3048 + return out; 3049 + } 3050 + #endif 3051 + 1778 3052 static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs) 1779 3053 { 1780 3054 // resample with nearest-neighbor ··· 1786 3060 return out; 1787 3061 } 1788 3062 3063 + #ifdef STBI_JPEG_OLD 3064 + // this is the same YCbCr-to-RGB calculation that stb_image has used 3065 + // historically before the algorithm changes in 1.49 1789 3066 #define float2fixed(x) ((int) ((x) * 65536 + 0.5)) 1790 - 1791 - // 0.38 seconds on 3*anemones.jpg (0.25 with processor = Pro) 1792 - // VC6 without processor=Pro is generating multiple LEAs per multiply! 1793 3067 static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step) 1794 3068 { 1795 3069 int i; ··· 1814 3088 out += step; 1815 3089 } 1816 3090 } 1817 - 1818 - #ifdef STBI_SIMD 1819 - static stbi_YCbCr_to_RGB_run stbi__YCbCr_installed = stbi__YCbCr_to_RGB_row; 3091 + #else 3092 + // this is a reduced-precision calculation of YCbCr-to-RGB introduced 3093 + // to make sure the code produces the same results in both SIMD and scalar 3094 + #define float2fixed(x) (((int) ((x) * 4096.0f + 0.5f)) << 8) 3095 + static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step) 3096 + { 3097 + int i; 3098 + for (i=0; i < count; ++i) { 3099 + int y_fixed = (y[i] << 20) + (1<<19); // rounding 3100 + int r,g,b; 3101 + int cr = pcr[i] - 128; 3102 + int cb = pcb[i] - 128; 3103 + r = y_fixed + cr* float2fixed(1.40200f); 3104 + g = y_fixed + (cr*-float2fixed(0.71414f)) + ((cb*-float2fixed(0.34414f)) & 0xffff0000); 3105 + b = y_fixed + cb* float2fixed(1.77200f); 3106 + r >>= 20; 3107 + g >>= 20; 3108 + b >>= 20; 3109 + if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; } 3110 + if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; } 3111 + if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; } 3112 + out[0] = (stbi_uc)r; 3113 + out[1] = (stbi_uc)g; 3114 + out[2] = (stbi_uc)b; 3115 + out[3] = 255; 3116 + out += step; 3117 + } 3118 + } 3119 + #endif 1820 3120 1821 - STBIDEF void stbi_install_YCbCr_to_RGB(stbi_YCbCr_to_RGB_run func) 3121 + #if defined(STBI_SSE2) || defined(STBI_NEON) 3122 + static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step) 1822 3123 { 1823 - stbi__YCbCr_installed = func; 3124 + int i = 0; 3125 + 3126 + #ifdef STBI_SSE2 3127 + // step == 3 is pretty ugly on the final interleave, and i'm not convinced 3128 + // it's useful in practice (you wouldn't use it for textures, for example). 3129 + // so just accelerate step == 4 case. 3130 + if (step == 4) { 3131 + // this is a fairly straightforward implementation and not super-optimized. 3132 + __m128i signflip = _mm_set1_epi8(-0x80); 3133 + __m128i cr_const0 = _mm_set1_epi16( (short) ( 1.40200f*4096.0f+0.5f)); 3134 + __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f)); 3135 + __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f)); 3136 + __m128i cb_const1 = _mm_set1_epi16( (short) ( 1.77200f*4096.0f+0.5f)); 3137 + __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128); 3138 + __m128i xw = _mm_set1_epi16(255); // alpha channel 3139 + 3140 + for (; i+7 < count; i += 8) { 3141 + // load 3142 + __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i)); 3143 + __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i)); 3144 + __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i)); 3145 + __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128 3146 + __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128 3147 + 3148 + // unpack to short (and left-shift cr, cb by 8) 3149 + __m128i yw = _mm_unpacklo_epi8(y_bias, y_bytes); 3150 + __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased); 3151 + __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased); 3152 + 3153 + // color transform 3154 + __m128i yws = _mm_srli_epi16(yw, 4); 3155 + __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw); 3156 + __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw); 3157 + __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1); 3158 + __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1); 3159 + __m128i rws = _mm_add_epi16(cr0, yws); 3160 + __m128i gwt = _mm_add_epi16(cb0, yws); 3161 + __m128i bws = _mm_add_epi16(yws, cb1); 3162 + __m128i gws = _mm_add_epi16(gwt, cr1); 3163 + 3164 + // descale 3165 + __m128i rw = _mm_srai_epi16(rws, 4); 3166 + __m128i bw = _mm_srai_epi16(bws, 4); 3167 + __m128i gw = _mm_srai_epi16(gws, 4); 3168 + 3169 + // back to byte, set up for transpose 3170 + __m128i brb = _mm_packus_epi16(rw, bw); 3171 + __m128i gxb = _mm_packus_epi16(gw, xw); 3172 + 3173 + // transpose to interleave channels 3174 + __m128i t0 = _mm_unpacklo_epi8(brb, gxb); 3175 + __m128i t1 = _mm_unpackhi_epi8(brb, gxb); 3176 + __m128i o0 = _mm_unpacklo_epi16(t0, t1); 3177 + __m128i o1 = _mm_unpackhi_epi16(t0, t1); 3178 + 3179 + // store 3180 + _mm_storeu_si128((__m128i *) (out + 0), o0); 3181 + _mm_storeu_si128((__m128i *) (out + 16), o1); 3182 + out += 32; 3183 + } 3184 + } 3185 + #endif 3186 + 3187 + #ifdef STBI_NEON 3188 + // in this version, step=3 support would be easy to add. but is there demand? 3189 + if (step == 4) { 3190 + // this is a fairly straightforward implementation and not super-optimized. 3191 + uint8x8_t signflip = vdup_n_u8(0x80); 3192 + int16x8_t cr_const0 = vdupq_n_s16( (short) ( 1.40200f*4096.0f+0.5f)); 3193 + int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f)); 3194 + int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f)); 3195 + int16x8_t cb_const1 = vdupq_n_s16( (short) ( 1.77200f*4096.0f+0.5f)); 3196 + 3197 + for (; i+7 < count; i += 8) { 3198 + // load 3199 + uint8x8_t y_bytes = vld1_u8(y + i); 3200 + uint8x8_t cr_bytes = vld1_u8(pcr + i); 3201 + uint8x8_t cb_bytes = vld1_u8(pcb + i); 3202 + int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip)); 3203 + int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip)); 3204 + 3205 + // expand to s16 3206 + int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4)); 3207 + int16x8_t crw = vshll_n_s8(cr_biased, 7); 3208 + int16x8_t cbw = vshll_n_s8(cb_biased, 7); 3209 + 3210 + // color transform 3211 + int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0); 3212 + int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0); 3213 + int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1); 3214 + int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1); 3215 + int16x8_t rws = vaddq_s16(yws, cr0); 3216 + int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1); 3217 + int16x8_t bws = vaddq_s16(yws, cb1); 3218 + 3219 + // undo scaling, round, convert to byte 3220 + uint8x8x4_t o; 3221 + o.val[0] = vqrshrun_n_s16(rws, 4); 3222 + o.val[1] = vqrshrun_n_s16(gws, 4); 3223 + o.val[2] = vqrshrun_n_s16(bws, 4); 3224 + o.val[3] = vdup_n_u8(255); 3225 + 3226 + // store, interleaving r/g/b/a 3227 + vst4_u8(out, o); 3228 + out += 8*4; 3229 + } 3230 + } 3231 + #endif 3232 + 3233 + for (; i < count; ++i) { 3234 + int y_fixed = (y[i] << 20) + (1<<19); // rounding 3235 + int r,g,b; 3236 + int cr = pcr[i] - 128; 3237 + int cb = pcb[i] - 128; 3238 + r = y_fixed + cr* float2fixed(1.40200f); 3239 + g = y_fixed + cr*-float2fixed(0.71414f) + ((cb*-float2fixed(0.34414f)) & 0xffff0000); 3240 + b = y_fixed + cb* float2fixed(1.77200f); 3241 + r >>= 20; 3242 + g >>= 20; 3243 + b >>= 20; 3244 + if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; } 3245 + if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; } 3246 + if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; } 3247 + out[0] = (stbi_uc)r; 3248 + out[1] = (stbi_uc)g; 3249 + out[2] = (stbi_uc)b; 3250 + out[3] = 255; 3251 + out += step; 3252 + } 1824 3253 } 1825 3254 #endif 1826 3255 3256 + // set up the kernels 3257 + static void stbi__setup_jpeg(stbi__jpeg *j) 3258 + { 3259 + j->idct_block_kernel = stbi__idct_block; 3260 + j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row; 3261 + j->resample_row_hv_2_kernel = stbi__resample_row_hv_2; 3262 + 3263 + #ifdef STBI_SSE2 3264 + if (stbi__sse2_available()) { 3265 + j->idct_block_kernel = stbi__idct_simd; 3266 + #ifndef STBI_JPEG_OLD 3267 + j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd; 3268 + #endif 3269 + j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd; 3270 + } 3271 + #endif 3272 + 3273 + #ifdef STBI_NEON 3274 + j->idct_block_kernel = stbi__idct_simd; 3275 + #ifndef STBI_JPEG_OLD 3276 + j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd; 3277 + #endif 3278 + j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd; 3279 + #endif 3280 + } 1827 3281 1828 3282 // clean up the temporary component buffers 1829 3283 static void stbi__cleanup_jpeg(stbi__jpeg *j) ··· 1831 3285 int i; 1832 3286 for (i=0; i < j->s->img_n; ++i) { 1833 3287 if (j->img_comp[i].raw_data) { 1834 - free(j->img_comp[i].raw_data); 3288 + STBI_FREE(j->img_comp[i].raw_data); 1835 3289 j->img_comp[i].raw_data = NULL; 1836 3290 j->img_comp[i].data = NULL; 1837 3291 } 3292 + if (j->img_comp[i].raw_coeff) { 3293 + STBI_FREE(j->img_comp[i].raw_coeff); 3294 + j->img_comp[i].raw_coeff = 0; 3295 + j->img_comp[i].coeff = 0; 3296 + } 1838 3297 if (j->img_comp[i].linebuf) { 1839 - free(j->img_comp[i].linebuf); 3298 + STBI_FREE(j->img_comp[i].linebuf); 1840 3299 j->img_comp[i].linebuf = NULL; 1841 3300 } 1842 3301 } ··· 1847 3306 resample_row_func resample; 1848 3307 stbi_uc *line0,*line1; 1849 3308 int hs,vs; // expansion factor in each axis 1850 - int w_lores; // horizontal pixels pre-expansion 3309 + int w_lores; // horizontal pixels pre-expansion 1851 3310 int ystep; // how far through vertical expansion we are 1852 3311 int ypos; // which pre-expansion row we're on 1853 3312 } stbi__resample; ··· 1860 3319 // validate req_comp 1861 3320 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error"); 1862 3321 1863 - // load a jpeg image from whichever source 1864 - if (!decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; } 3322 + // load a jpeg image from whichever source, but leave in YCbCr format 3323 + if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; } 1865 3324 1866 3325 // determine actual number of components to generate 1867 3326 n = req_comp ? req_comp : z->s->img_n; ··· 1898 3357 if (r->hs == 1 && r->vs == 1) r->resample = resample_row_1; 1899 3358 else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2; 1900 3359 else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2; 1901 - else if (r->hs == 2 && r->vs == 2) r->resample = stbi__resample_row_hv_2; 3360 + else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel; 1902 3361 else r->resample = stbi__resample_row_generic; 1903 3362 } 1904 3363 ··· 1926 3385 if (n >= 3) { 1927 3386 stbi_uc *y = coutput[0]; 1928 3387 if (z->s->img_n == 3) { 1929 - #ifdef STBI_SIMD 1930 - stbi__YCbCr_installed(out, y, coutput[1], coutput[2], z->s->img_x, n); 1931 - #else 1932 - stbi__YCbCr_to_RGB_row(out, y, coutput[1], coutput[2], z->s->img_x, n); 1933 - #endif 3388 + z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n); 1934 3389 } else 1935 3390 for (i=0; i < z->s->img_x; ++i) { 1936 3391 out[0] = out[1] = out[2] = y[i]; ··· 1957 3412 { 1958 3413 stbi__jpeg j; 1959 3414 j.s = s; 3415 + stbi__setup_jpeg(&j); 1960 3416 return load_jpeg_image(&j, x,y,comp,req_comp); 1961 3417 } 1962 3418 ··· 1965 3421 int r; 1966 3422 stbi__jpeg j; 1967 3423 j.s = s; 1968 - r = decode_jpeg_header(&j, SCAN_type); 3424 + stbi__setup_jpeg(&j); 3425 + r = stbi__decode_jpeg_header(&j, STBI__SCAN_type); 1969 3426 stbi__rewind(s); 1970 3427 return r; 1971 3428 } 1972 3429 1973 3430 static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp) 1974 3431 { 1975 - if (!decode_jpeg_header(j, SCAN_header)) { 3432 + if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) { 1976 3433 stbi__rewind( j->s ); 1977 3434 return 0; 1978 3435 } ··· 1988 3445 j.s = s; 1989 3446 return stbi__jpeg_info_raw(&j, x, y, comp); 1990 3447 } 3448 + #endif 1991 3449 1992 3450 // public domain zlib decode v0.2 Sean Barrett 2006-11-18 1993 3451 // simple implementation ··· 1995 3453 // - all output is written to a single output buffer (can malloc/realloc) 1996 3454 // performance 1997 3455 // - fast huffman 3456 + 3457 + #ifndef STBI_NO_ZLIB 1998 3458 1999 3459 // fast-way is faster to check than jpeg huffman, but slow way is slower 2000 3460 #define STBI__ZFAST_BITS 9 // accelerate all cases in default tables ··· 2009 3469 int maxcode[17]; 2010 3470 stbi__uint16 firstsymbol[16]; 2011 3471 stbi_uc size[288]; 2012 - stbi__uint16 value[288]; 3472 + stbi__uint16 value[288]; 2013 3473 } stbi__zhuffman; 2014 3474 2015 3475 stbi_inline static int stbi__bitreverse16(int n) ··· 2025 3485 { 2026 3486 STBI_ASSERT(bits <= 16); 2027 3487 // to bit reverse n bits, reverse 16 and shift 2028 - // stbi__err.g. 11 bits, bit reverse and shift away 5 3488 + // e.g. 11 bits, bit reverse and shift away 5 2029 3489 return stbi__bitreverse16(v) >> (16-bits); 2030 3490 } 2031 3491 ··· 2036 3496 2037 3497 // DEFLATE spec for generating codes 2038 3498 memset(sizes, 0, sizeof(sizes)); 2039 - memset(z->fast, 255, sizeof(z->fast)); 2040 - for (i=0; i < num; ++i) 3499 + memset(z->fast, 0, sizeof(z->fast)); 3500 + for (i=0; i < num; ++i) 2041 3501 ++sizes[sizelist[i]]; 2042 3502 sizes[0] = 0; 2043 3503 for (i=1; i < 16; ++i) 2044 - STBI_ASSERT(sizes[i] <= (1 << i)); 3504 + if (sizes[i] > (1 << i)) 3505 + return stbi__err("bad sizes", "Corrupt PNG"); 2045 3506 code = 0; 2046 3507 for (i=1; i < 16; ++i) { 2047 3508 next_code[i] = code; ··· 2049 3510 z->firstsymbol[i] = (stbi__uint16) k; 2050 3511 code = (code + sizes[i]); 2051 3512 if (sizes[i]) 2052 - if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt JPEG"); 3513 + if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG"); 2053 3514 z->maxcode[i] = code << (16-i); // preshift for inner loop 2054 3515 code <<= 1; 2055 3516 k += sizes[i]; ··· 2059 3520 int s = sizelist[i]; 2060 3521 if (s) { 2061 3522 int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s]; 3523 + stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i); 2062 3524 z->size [c] = (stbi_uc ) s; 2063 3525 z->value[c] = (stbi__uint16) i; 2064 3526 if (s <= STBI__ZFAST_BITS) { 2065 - int k = stbi__bit_reverse(next_code[s],s); 2066 - while (k < (1 << STBI__ZFAST_BITS)) { 2067 - z->fast[k] = (stbi__uint16) c; 2068 - k += (1 << s); 3527 + int j = stbi__bit_reverse(next_code[s],s); 3528 + while (j < (1 << STBI__ZFAST_BITS)) { 3529 + z->fast[j] = fastv; 3530 + j += (1 << s); 2069 3531 } 2070 3532 } 2071 3533 ++next_code[s]; ··· 2104 3566 { 2105 3567 do { 2106 3568 STBI_ASSERT(z->code_buffer < (1U << z->num_bits)); 2107 - z->code_buffer |= stbi__zget8(z) << z->num_bits; 3569 + z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits; 2108 3570 z->num_bits += 8; 2109 3571 } while (z->num_bits <= 24); 2110 3572 } ··· 2116 3578 k = z->code_buffer & ((1 << n) - 1); 2117 3579 z->code_buffer >>= n; 2118 3580 z->num_bits -= n; 2119 - return k; 3581 + return k; 2120 3582 } 2121 3583 2122 - stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z) 3584 + static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z) 2123 3585 { 2124 3586 int b,s,k; 2125 - if (a->num_bits < 16) stbi__fill_bits(a); 2126 - b = z->fast[a->code_buffer & STBI__ZFAST_MASK]; 2127 - if (b < 0xffff) { 2128 - s = z->size[b]; 2129 - a->code_buffer >>= s; 2130 - a->num_bits -= s; 2131 - return z->value[b]; 2132 - } 2133 - 2134 3587 // not resolved by fast table, so compute it the slow way 2135 3588 // use jpeg approach, which requires MSbits at top 2136 3589 k = stbi__bit_reverse(a->code_buffer, 16); ··· 2146 3599 return z->value[b]; 2147 3600 } 2148 3601 2149 - static int stbi__zexpand(stbi__zbuf *z, int n) // need to make room for n bytes 3602 + stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z) 3603 + { 3604 + int b,s; 3605 + if (a->num_bits < 16) stbi__fill_bits(a); 3606 + b = z->fast[a->code_buffer & STBI__ZFAST_MASK]; 3607 + if (b) { 3608 + s = b >> 9; 3609 + a->code_buffer >>= s; 3610 + a->num_bits -= s; 3611 + return b & 511; 3612 + } 3613 + return stbi__zhuffman_decode_slowpath(a, z); 3614 + } 3615 + 3616 + static int stbi__zexpand(stbi__zbuf *z, char *zout, int n) // need to make room for n bytes 2150 3617 { 2151 3618 char *q; 2152 3619 int cur, limit; 3620 + z->zout = zout; 2153 3621 if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG"); 2154 3622 cur = (int) (z->zout - z->zout_start); 2155 3623 limit = (int) (z->zout_end - z->zout_start); 2156 3624 while (cur + n > limit) 2157 3625 limit *= 2; 2158 - q = (char *) realloc(z->zout_start, limit); 3626 + q = (char *) STBI_REALLOC(z->zout_start, limit); 2159 3627 if (q == NULL) return stbi__err("outofmem", "Out of memory"); 2160 3628 z->zout_start = q; 2161 3629 z->zout = q + cur; ··· 2168 3636 15,17,19,23,27,31,35,43,51,59, 2169 3637 67,83,99,115,131,163,195,227,258,0,0 }; 2170 3638 2171 - static int stbi__zlength_extra[31]= 3639 + static int stbi__zlength_extra[31]= 2172 3640 { 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 }; 2173 3641 2174 3642 static int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193, ··· 2179 3647 2180 3648 static int stbi__parse_huffman_block(stbi__zbuf *a) 2181 3649 { 3650 + char *zout = a->zout; 2182 3651 for(;;) { 2183 3652 int z = stbi__zhuffman_decode(a, &a->z_length); 2184 3653 if (z < 256) { 2185 3654 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes 2186 - if (a->zout >= a->zout_end) if (!stbi__zexpand(a, 1)) return 0; 2187 - *a->zout++ = (char) z; 3655 + if (zout >= a->zout_end) { 3656 + if (!stbi__zexpand(a, zout, 1)) return 0; 3657 + zout = a->zout; 3658 + } 3659 + *zout++ = (char) z; 2188 3660 } else { 2189 3661 stbi_uc *p; 2190 3662 int len,dist; 2191 - if (z == 256) return 1; 3663 + if (z == 256) { 3664 + a->zout = zout; 3665 + return 1; 3666 + } 2192 3667 z -= 257; 2193 3668 len = stbi__zlength_base[z]; 2194 3669 if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]); ··· 2196 3671 if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); 2197 3672 dist = stbi__zdist_base[z]; 2198 3673 if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]); 2199 - if (a->zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG"); 2200 - if (a->zout + len > a->zout_end) if (!stbi__zexpand(a, len)) return 0; 2201 - p = (stbi_uc *) (a->zout - dist); 2202 - while (len--) 2203 - *a->zout++ = *p++; 3674 + if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG"); 3675 + if (zout + len > a->zout_end) { 3676 + if (!stbi__zexpand(a, zout, len)) return 0; 3677 + zout = a->zout; 3678 + } 3679 + p = (stbi_uc *) (zout - dist); 3680 + if (dist == 1) { // run of one byte; common in images. 3681 + stbi_uc v = *p; 3682 + if (len) { do *zout++ = v; while (--len); } 3683 + } else { 3684 + if (len) { do *zout++ = *p++; while (--len); } 3685 + } 2204 3686 } 2205 3687 } 2206 3688 } ··· 2227 3709 n = 0; 2228 3710 while (n < hlit + hdist) { 2229 3711 int c = stbi__zhuffman_decode(a, &z_codelength); 2230 - STBI_ASSERT(c >= 0 && c < 19); 3712 + if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG"); 2231 3713 if (c < 16) 2232 3714 lencodes[n++] = (stbi_uc) c; 2233 3715 else if (c == 16) { ··· 2273 3755 if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG"); 2274 3756 if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG"); 2275 3757 if (a->zout + len > a->zout_end) 2276 - if (!stbi__zexpand(a, len)) return 0; 3758 + if (!stbi__zexpand(a, a->zout, len)) return 0; 2277 3759 memcpy(a->zout, a->zbuffer, len); 2278 3760 a->zbuffer += len; 2279 3761 a->zout += len; ··· 2356 3838 if (outlen) *outlen = (int) (a.zout - a.zout_start); 2357 3839 return a.zout_start; 2358 3840 } else { 2359 - free(a.zout_start); 3841 + STBI_FREE(a.zout_start); 2360 3842 return NULL; 2361 3843 } 2362 3844 } ··· 2377 3859 if (outlen) *outlen = (int) (a.zout - a.zout_start); 2378 3860 return a.zout_start; 2379 3861 } else { 2380 - free(a.zout_start); 3862 + STBI_FREE(a.zout_start); 2381 3863 return NULL; 2382 3864 } 2383 3865 } ··· 2404 3886 if (outlen) *outlen = (int) (a.zout - a.zout_start); 2405 3887 return a.zout_start; 2406 3888 } else { 2407 - free(a.zout_start); 3889 + STBI_FREE(a.zout_start); 2408 3890 return NULL; 2409 3891 } 2410 3892 } ··· 2419 3901 else 2420 3902 return -1; 2421 3903 } 3904 + #endif 2422 3905 2423 3906 // public domain "baseline" PNG decoder v0.10 Sean Barrett 2006-11-18 2424 3907 // simple implementation ··· 2430 3913 // performance 2431 3914 // - uses stb_zlib, a PD zlib implementation with fast huffman decoding 2432 3915 2433 - 3916 + #ifndef STBI_NO_PNG 2434 3917 typedef struct 2435 3918 { 2436 3919 stbi__uint32 length; 2437 3920 stbi__uint32 type; 2438 3921 } stbi__pngchunk; 2439 3922 2440 - #define PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d)) 2441 - 2442 3923 static stbi__pngchunk stbi__get_chunk_header(stbi__context *s) 2443 3924 { 2444 3925 stbi__pngchunk c; ··· 2464 3945 2465 3946 2466 3947 enum { 2467 - STBI__F_none=0, STBI__F_sub=1, STBI__F_up=2, STBI__F_avg=3, STBI__F_paeth=4, 2468 - STBI__F_avg_first, STBI__F_paeth_first 3948 + STBI__F_none=0, 3949 + STBI__F_sub=1, 3950 + STBI__F_up=2, 3951 + STBI__F_avg=3, 3952 + STBI__F_paeth=4, 3953 + // synthetic filters used for first scanline to avoid needing a dummy row of 0s 3954 + STBI__F_avg_first, 3955 + STBI__F_paeth_first 2469 3956 }; 2470 3957 2471 3958 static stbi_uc first_row_filter[5] = 2472 3959 { 2473 - STBI__F_none, STBI__F_sub, STBI__F_none, STBI__F_avg_first, STBI__F_paeth_first 3960 + STBI__F_none, 3961 + STBI__F_sub, 3962 + STBI__F_none, 3963 + STBI__F_avg_first, 3964 + STBI__F_paeth_first 2474 3965 }; 2475 3966 2476 3967 static int stbi__paeth(int a, int b, int c) ··· 2484 3975 return c; 2485 3976 } 2486 3977 2487 - #define STBI__BYTECAST(x) ((stbi_uc) ((x) & 255)) // truncate int to byte without warnings 3978 + static stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 }; 2488 3979 2489 3980 // create the png data from post-deflated data 2490 - static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y) 3981 + static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color) 2491 3982 { 2492 3983 stbi__context *s = a->s; 2493 3984 stbi__uint32 i,j,stride = x*out_n; 3985 + stbi__uint32 img_len, img_width_bytes; 2494 3986 int k; 2495 3987 int img_n = s->img_n; // copy it into a local for later 3988 + 2496 3989 STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1); 2497 - a->out = (stbi_uc *) stbi__malloc(x * y * out_n); 3990 + a->out = (stbi_uc *) stbi__malloc(x * y * out_n); // extra bytes to write off the end into 2498 3991 if (!a->out) return stbi__err("outofmem", "Out of memory"); 3992 + 3993 + img_width_bytes = (((img_n * x * depth) + 7) >> 3); 3994 + img_len = (img_width_bytes + 1) * y; 2499 3995 if (s->img_x == x && s->img_y == y) { 2500 - if (raw_len != (img_n * x + 1) * y) return stbi__err("not enough pixels","Corrupt PNG"); 3996 + if (raw_len != img_len) return stbi__err("not enough pixels","Corrupt PNG"); 2501 3997 } else { // interlaced: 2502 - if (raw_len < (img_n * x + 1) * y) return stbi__err("not enough pixels","Corrupt PNG"); 3998 + if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG"); 2503 3999 } 4000 + 2504 4001 for (j=0; j < y; ++j) { 2505 4002 stbi_uc *cur = a->out + stride*j; 2506 4003 stbi_uc *prior = cur - stride; 2507 4004 int filter = *raw++; 2508 - if (filter > 4) return stbi__err("invalid filter","Corrupt PNG"); 4005 + int filter_bytes = img_n; 4006 + int width = x; 4007 + if (filter > 4) 4008 + return stbi__err("invalid filter","Corrupt PNG"); 4009 + 4010 + if (depth < 8) { 4011 + STBI_ASSERT(img_width_bytes <= x); 4012 + cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place 4013 + filter_bytes = 1; 4014 + width = img_width_bytes; 4015 + } 4016 + 2509 4017 // if first row, use special filter that doesn't sample previous row 2510 4018 if (j == 0) filter = first_row_filter[filter]; 2511 - // handle first pixel explicitly 2512 - for (k=0; k < img_n; ++k) { 4019 + 4020 + // handle first byte explicitly 4021 + for (k=0; k < filter_bytes; ++k) { 2513 4022 switch (filter) { 2514 4023 case STBI__F_none : cur[k] = raw[k]; break; 2515 4024 case STBI__F_sub : cur[k] = raw[k]; break; ··· 2520 4029 case STBI__F_paeth_first: cur[k] = raw[k]; break; 2521 4030 } 2522 4031 } 2523 - if (img_n != out_n) cur[img_n] = 255; 2524 - raw += img_n; 2525 - cur += out_n; 2526 - prior += out_n; 4032 + 4033 + if (depth == 8) { 4034 + if (img_n != out_n) 4035 + cur[img_n] = 255; // first pixel 4036 + raw += img_n; 4037 + cur += out_n; 4038 + prior += out_n; 4039 + } else { 4040 + raw += 1; 4041 + cur += 1; 4042 + prior += 1; 4043 + } 4044 + 2527 4045 // this is a little gross, so that we don't switch per-pixel or per-component 2528 - if (img_n == out_n) { 4046 + if (depth < 8 || img_n == out_n) { 4047 + int nk = (width - 1)*img_n; 2529 4048 #define CASE(f) \ 2530 4049 case f: \ 2531 - for (i=x-1; i >= 1; --i, raw+=img_n,cur+=img_n,prior+=img_n) \ 2532 - for (k=0; k < img_n; ++k) 4050 + for (k=0; k < nk; ++k) 2533 4051 switch (filter) { 2534 - CASE(STBI__F_none) cur[k] = raw[k]; break; 2535 - CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k-img_n]); break; 4052 + // "none" filter turns into a memcpy here; make that explicit. 4053 + case STBI__F_none: memcpy(cur, raw, nk); break; 4054 + CASE(STBI__F_sub) cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); break; 2536 4055 CASE(STBI__F_up) cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break; 2537 - CASE(STBI__F_avg) cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-img_n])>>1)); break; 2538 - CASE(STBI__F_paeth) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-img_n],prior[k],prior[k-img_n])); break; 2539 - CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k-img_n] >> 1)); break; 2540 - CASE(STBI__F_paeth_first) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-img_n],0,0)); break; 4056 + CASE(STBI__F_avg) cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); break; 4057 + CASE(STBI__F_paeth) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); break; 4058 + CASE(STBI__F_avg_first) cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); break; 4059 + CASE(STBI__F_paeth_first) cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); break; 2541 4060 } 2542 4061 #undef CASE 4062 + raw += nk; 2543 4063 } else { 2544 4064 STBI_ASSERT(img_n+1 == out_n); 2545 4065 #define CASE(f) \ ··· 2558 4078 #undef CASE 2559 4079 } 2560 4080 } 4081 + 4082 + // we make a separate pass to expand bits to pixels; for performance, 4083 + // this could run two scanlines behind the above code, so it won't 4084 + // intefere with filtering but will still be in the cache. 4085 + if (depth < 8) { 4086 + for (j=0; j < y; ++j) { 4087 + stbi_uc *cur = a->out + stride*j; 4088 + stbi_uc *in = a->out + stride*j + x*out_n - img_width_bytes; 4089 + // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit 4090 + // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop 4091 + stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range 4092 + 4093 + // note that the final byte might overshoot and write more data than desired. 4094 + // we can allocate enough data that this never writes out of memory, but it 4095 + // could also overwrite the next scanline. can it overwrite non-empty data 4096 + // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel. 4097 + // so we need to explicitly clamp the final ones 4098 + 4099 + if (depth == 4) { 4100 + for (k=x*img_n; k >= 2; k-=2, ++in) { 4101 + *cur++ = scale * ((*in >> 4) ); 4102 + *cur++ = scale * ((*in ) & 0x0f); 4103 + } 4104 + if (k > 0) *cur++ = scale * ((*in >> 4) ); 4105 + } else if (depth == 2) { 4106 + for (k=x*img_n; k >= 4; k-=4, ++in) { 4107 + *cur++ = scale * ((*in >> 6) ); 4108 + *cur++ = scale * ((*in >> 4) & 0x03); 4109 + *cur++ = scale * ((*in >> 2) & 0x03); 4110 + *cur++ = scale * ((*in ) & 0x03); 4111 + } 4112 + if (k > 0) *cur++ = scale * ((*in >> 6) ); 4113 + if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03); 4114 + if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03); 4115 + } else if (depth == 1) { 4116 + for (k=x*img_n; k >= 8; k-=8, ++in) { 4117 + *cur++ = scale * ((*in >> 7) ); 4118 + *cur++ = scale * ((*in >> 6) & 0x01); 4119 + *cur++ = scale * ((*in >> 5) & 0x01); 4120 + *cur++ = scale * ((*in >> 4) & 0x01); 4121 + *cur++ = scale * ((*in >> 3) & 0x01); 4122 + *cur++ = scale * ((*in >> 2) & 0x01); 4123 + *cur++ = scale * ((*in >> 1) & 0x01); 4124 + *cur++ = scale * ((*in ) & 0x01); 4125 + } 4126 + if (k > 0) *cur++ = scale * ((*in >> 7) ); 4127 + if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01); 4128 + if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01); 4129 + if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01); 4130 + if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01); 4131 + if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01); 4132 + if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01); 4133 + } 4134 + if (img_n != out_n) { 4135 + int q; 4136 + // insert alpha = 255 4137 + cur = a->out + stride*j; 4138 + if (img_n == 1) { 4139 + for (q=x-1; q >= 0; --q) { 4140 + cur[q*2+1] = 255; 4141 + cur[q*2+0] = cur[q]; 4142 + } 4143 + } else { 4144 + STBI_ASSERT(img_n == 3); 4145 + for (q=x-1; q >= 0; --q) { 4146 + cur[q*4+3] = 255; 4147 + cur[q*4+2] = cur[q*3+2]; 4148 + cur[q*4+1] = cur[q*3+1]; 4149 + cur[q*4+0] = cur[q*3+0]; 4150 + } 4151 + } 4152 + } 4153 + } 4154 + } 4155 + 2561 4156 return 1; 2562 4157 } 2563 4158 2564 - static int stbi__create_png_image(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, int interlaced) 4159 + static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced) 2565 4160 { 2566 4161 stbi_uc *final; 2567 4162 int p; 2568 4163 if (!interlaced) 2569 - return stbi__create_png_image_raw(a, raw, raw_len, out_n, a->s->img_x, a->s->img_y); 4164 + return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color); 2570 4165 2571 4166 // de-interlacing 2572 4167 final = (stbi_uc *) stbi__malloc(a->s->img_x * a->s->img_y * out_n); ··· 2580 4175 x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p]; 2581 4176 y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p]; 2582 4177 if (x && y) { 2583 - if (!stbi__create_png_image_raw(a, raw, raw_len, out_n, x, y)) { 2584 - free(final); 4178 + stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y; 4179 + if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) { 4180 + STBI_FREE(final); 2585 4181 return 0; 2586 4182 } 2587 - for (j=0; j < y; ++j) 2588 - for (i=0; i < x; ++i) 2589 - memcpy(final + (j*yspc[p]+yorig[p])*a->s->img_x*out_n + (i*xspc[p]+xorig[p])*out_n, 4183 + for (j=0; j < y; ++j) { 4184 + for (i=0; i < x; ++i) { 4185 + int out_y = j*yspc[p]+yorig[p]; 4186 + int out_x = i*xspc[p]+xorig[p]; 4187 + memcpy(final + out_y*a->s->img_x*out_n + out_x*out_n, 2590 4188 a->out + (j*x+i)*out_n, out_n); 2591 - free(a->out); 2592 - raw += (x*out_n+1)*y; 2593 - raw_len -= (x*out_n+1)*y; 4189 + } 4190 + } 4191 + STBI_FREE(a->out); 4192 + image_data += img_len; 4193 + image_data_len -= img_len; 2594 4194 } 2595 4195 } 2596 4196 a->out = final; ··· 2652 4252 p += 4; 2653 4253 } 2654 4254 } 2655 - free(a->out); 4255 + STBI_FREE(a->out); 2656 4256 a->out = temp_out; 2657 4257 2658 4258 STBI_NOTUSED(len); ··· 2700 4300 } else { 2701 4301 p[0] = p[2]; 2702 4302 p[2] = t; 2703 - } 4303 + } 2704 4304 p += 4; 2705 4305 } 2706 4306 } else { ··· 2715 4315 } 2716 4316 } 2717 4317 4318 + #define STBI__PNG_TYPE(a,b,c,d) (((a) << 24) + ((b) << 16) + ((c) << 8) + (d)) 4319 + 2718 4320 static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp) 2719 4321 { 2720 4322 stbi_uc palette[1024], pal_img_n=0; 2721 4323 stbi_uc has_trans=0, tc[3]; 2722 4324 stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0; 2723 - int first=1,k,interlace=0, is_iphone=0; 4325 + int first=1,k,interlace=0, color=0, depth=0, is_iphone=0; 2724 4326 stbi__context *s = z->s; 2725 4327 2726 4328 z->expanded = NULL; ··· 2729 4331 2730 4332 if (!stbi__check_png_header(s)) return 0; 2731 4333 2732 - if (scan == SCAN_type) return 1; 4334 + if (scan == STBI__SCAN_type) return 1; 2733 4335 2734 4336 for (;;) { 2735 4337 stbi__pngchunk c = stbi__get_chunk_header(s); 2736 4338 switch (c.type) { 2737 - case PNG_TYPE('C','g','B','I'): 4339 + case STBI__PNG_TYPE('C','g','B','I'): 2738 4340 is_iphone = 1; 2739 4341 stbi__skip(s, c.length); 2740 4342 break; 2741 - case PNG_TYPE('I','H','D','R'): { 2742 - int depth,color,comp,filter; 4343 + case STBI__PNG_TYPE('I','H','D','R'): { 4344 + int comp,filter; 2743 4345 if (!first) return stbi__err("multiple IHDR","Corrupt PNG"); 2744 4346 first = 0; 2745 4347 if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG"); 2746 4348 s->img_x = stbi__get32be(s); if (s->img_x > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)"); 2747 4349 s->img_y = stbi__get32be(s); if (s->img_y > (1 << 24)) return stbi__err("too large","Very large image (corrupt?)"); 2748 - depth = stbi__get8(s); if (depth != 8) return stbi__err("8bit only","PNG not supported: 8-bit only"); 4350 + depth = stbi__get8(s); if (depth != 1 && depth != 2 && depth != 4 && depth != 8) return stbi__err("1/2/4/8-bit only","PNG not supported: 1/2/4/8-bit only"); 2749 4351 color = stbi__get8(s); if (color > 6) return stbi__err("bad ctype","Corrupt PNG"); 2750 4352 if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG"); 2751 4353 comp = stbi__get8(s); if (comp) return stbi__err("bad comp method","Corrupt PNG"); ··· 2755 4357 if (!pal_img_n) { 2756 4358 s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0); 2757 4359 if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode"); 2758 - if (scan == SCAN_header) return 1; 4360 + if (scan == STBI__SCAN_header) return 1; 2759 4361 } else { 2760 4362 // if paletted, then pal_n is our final components, and 2761 4363 // img_n is # components to decompress/filter. ··· 2766 4368 break; 2767 4369 } 2768 4370 2769 - case PNG_TYPE('P','L','T','E'): { 4371 + case STBI__PNG_TYPE('P','L','T','E'): { 2770 4372 if (first) return stbi__err("first not IHDR", "Corrupt PNG"); 2771 4373 if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG"); 2772 4374 pal_len = c.length / 3; ··· 2780 4382 break; 2781 4383 } 2782 4384 2783 - case PNG_TYPE('t','R','N','S'): { 4385 + case STBI__PNG_TYPE('t','R','N','S'): { 2784 4386 if (first) return stbi__err("first not IHDR", "Corrupt PNG"); 2785 4387 if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG"); 2786 4388 if (pal_img_n) { 2787 - if (scan == SCAN_header) { s->img_n = 4; return 1; } 4389 + if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; } 2788 4390 if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG"); 2789 4391 if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG"); 2790 4392 pal_img_n = 4; ··· 2795 4397 if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG"); 2796 4398 has_trans = 1; 2797 4399 for (k=0; k < s->img_n; ++k) 2798 - tc[k] = (stbi_uc) (stbi__get16be(s) & 255); // non 8-bit images will be larger 4400 + tc[k] = (stbi_uc) (stbi__get16be(s) & 255) * stbi__depth_scale_table[depth]; // non 8-bit images will be larger 2799 4401 } 2800 4402 break; 2801 4403 } 2802 4404 2803 - case PNG_TYPE('I','D','A','T'): { 4405 + case STBI__PNG_TYPE('I','D','A','T'): { 2804 4406 if (first) return stbi__err("first not IHDR", "Corrupt PNG"); 2805 4407 if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG"); 2806 - if (scan == SCAN_header) { s->img_n = pal_img_n; return 1; } 4408 + if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; } 4409 + if ((int)(ioff + c.length) < (int)ioff) return 0; 2807 4410 if (ioff + c.length > idata_limit) { 2808 4411 stbi_uc *p; 2809 4412 if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096; 2810 4413 while (ioff + c.length > idata_limit) 2811 4414 idata_limit *= 2; 2812 - p = (stbi_uc *) realloc(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory"); 4415 + p = (stbi_uc *) STBI_REALLOC(z->idata, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory"); 2813 4416 z->idata = p; 2814 4417 } 2815 4418 if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG"); ··· 2817 4420 break; 2818 4421 } 2819 4422 2820 - case PNG_TYPE('I','E','N','D'): { 2821 - stbi__uint32 raw_len; 4423 + case STBI__PNG_TYPE('I','E','N','D'): { 4424 + stbi__uint32 raw_len, bpl; 2822 4425 if (first) return stbi__err("first not IHDR", "Corrupt PNG"); 2823 - if (scan != SCAN_load) return 1; 4426 + if (scan != STBI__SCAN_load) return 1; 2824 4427 if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG"); 2825 - z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, 16384, (int *) &raw_len, !is_iphone); 4428 + // initial guess for decoded data size to avoid unnecessary reallocs 4429 + bpl = (s->img_x * depth + 7) / 8; // bytes per line, per component 4430 + raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */; 4431 + z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone); 2826 4432 if (z->expanded == NULL) return 0; // zlib should set error 2827 - free(z->idata); z->idata = NULL; 4433 + STBI_FREE(z->idata); z->idata = NULL; 2828 4434 if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans) 2829 4435 s->img_out_n = s->img_n+1; 2830 4436 else 2831 4437 s->img_out_n = s->img_n; 2832 - if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, interlace)) return 0; 4438 + if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, depth, color, interlace)) return 0; 2833 4439 if (has_trans) 2834 4440 if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0; 2835 4441 if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2) ··· 2842 4448 if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n)) 2843 4449 return 0; 2844 4450 } 2845 - free(z->expanded); z->expanded = NULL; 4451 + STBI_FREE(z->expanded); z->expanded = NULL; 2846 4452 return 1; 2847 4453 } 2848 4454 ··· 2872 4478 { 2873 4479 unsigned char *result=NULL; 2874 4480 if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error"); 2875 - if (stbi__parse_png_file(p, SCAN_load, req_comp)) { 4481 + if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) { 2876 4482 result = p->out; 2877 4483 p->out = NULL; 2878 4484 if (req_comp && req_comp != p->s->img_out_n) { ··· 2884 4490 *y = p->s->img_y; 2885 4491 if (n) *n = p->s->img_out_n; 2886 4492 } 2887 - free(p->out); p->out = NULL; 2888 - free(p->expanded); p->expanded = NULL; 2889 - free(p->idata); p->idata = NULL; 4493 + STBI_FREE(p->out); p->out = NULL; 4494 + STBI_FREE(p->expanded); p->expanded = NULL; 4495 + STBI_FREE(p->idata); p->idata = NULL; 2890 4496 2891 4497 return result; 2892 4498 } ··· 2908 4514 2909 4515 static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp) 2910 4516 { 2911 - if (!stbi__parse_png_file(p, SCAN_header, 0)) { 4517 + if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) { 2912 4518 stbi__rewind( p->s ); 2913 4519 return 0; 2914 4520 } ··· 2924 4530 p.s = s; 2925 4531 return stbi__png_info_raw(&p, x, y, comp); 2926 4532 } 4533 + #endif 2927 4534 2928 4535 // Microsoft/Windows BMP image 4536 + 4537 + #ifndef STBI_NO_BMP 2929 4538 static int stbi__bmp_test_raw(stbi__context *s) 2930 4539 { 2931 4540 int r; ··· 2992 4601 static stbi_uc *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) 2993 4602 { 2994 4603 stbi_uc *out; 2995 - unsigned int mr=0,mg=0,mb=0,ma=0, fake_a=0; 4604 + unsigned int mr=0,mg=0,mb=0,ma=0, all_a=255; 2996 4605 stbi_uc pal[256][4]; 2997 4606 int psize=0,i,j,compress=0,width; 2998 4607 int bpp, flip_vertically, pad, target, offset, hsz; ··· 3041 4650 mg = 0xffu << 8; 3042 4651 mb = 0xffu << 0; 3043 4652 ma = 0xffu << 24; 3044 - fake_a = 1; // @TODO: check for cases like alpha value is all 0 and switch it to 255 3045 - STBI_NOTUSED(fake_a); 4653 + all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0 3046 4654 } else { 3047 4655 mr = 31u << 10; 3048 4656 mg = 31u << 5; ··· 3088 4696 if (!out) return stbi__errpuc("outofmem", "Out of memory"); 3089 4697 if (bpp < 16) { 3090 4698 int z=0; 3091 - if (psize == 0 || psize > 256) { free(out); return stbi__errpuc("invalid", "Corrupt BMP"); } 4699 + if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); } 3092 4700 for (i=0; i < psize; ++i) { 3093 4701 pal[i][2] = stbi__get8(s); 3094 4702 pal[i][1] = stbi__get8(s); ··· 3099 4707 stbi__skip(s, offset - 14 - hsz - psize * (hsz == 12 ? 3 : 4)); 3100 4708 if (bpp == 4) width = (s->img_x + 1) >> 1; 3101 4709 else if (bpp == 8) width = s->img_x; 3102 - else { free(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); } 4710 + else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); } 3103 4711 pad = (-width)&3; 3104 4712 for (j=0; j < (int) s->img_y; ++j) { 3105 4713 for (i=0; i < (int) s->img_x; i += 2) { ··· 3137 4745 easy = 2; 3138 4746 } 3139 4747 if (!easy) { 3140 - if (!mr || !mg || !mb) { free(out); return stbi__errpuc("bad masks", "Corrupt BMP"); } 4748 + if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); } 3141 4749 // right shift amt to put high bit in position #7 3142 4750 rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr); 3143 4751 gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg); ··· 3153 4761 out[z+0] = stbi__get8(s); 3154 4762 z += 3; 3155 4763 a = (easy == 2 ? stbi__get8(s) : 255); 4764 + all_a |= a; 3156 4765 if (target == 4) out[z++] = a; 3157 4766 } 3158 4767 } else { 3159 4768 for (i=0; i < (int) s->img_x; ++i) { 3160 - stbi__uint32 v = (stbi__uint32) (bpp == 16 ? stbi__get16le(s) : stbi__get32le(s)); 4769 + stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s)); 3161 4770 int a; 3162 4771 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount)); 3163 4772 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount)); 3164 4773 out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount)); 3165 4774 a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255); 3166 - if (target == 4) out[z++] = STBI__BYTECAST(a); 4775 + all_a |= a; 4776 + if (target == 4) out[z++] = STBI__BYTECAST(a); 3167 4777 } 3168 4778 } 3169 4779 stbi__skip(s, pad); 3170 4780 } 3171 4781 } 4782 + 4783 + // if alpha channel is all 0s, replace with all 255s 4784 + if (target == 4 && all_a == 0) 4785 + for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4) 4786 + out[i] = 255; 4787 + 3172 4788 if (flip_vertically) { 3173 4789 stbi_uc t; 3174 4790 for (j=0; j < (int) s->img_y>>1; ++j) { ··· 3190 4806 if (comp) *comp = s->img_n; 3191 4807 return out; 3192 4808 } 4809 + #endif 3193 4810 3194 4811 // Targa Truevision - TGA 3195 4812 // by Jonathan Dummer 3196 - 4813 + #ifndef STBI_NO_TGA 3197 4814 static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp) 3198 4815 { 3199 4816 int tga_w, tga_h, tga_comp; ··· 3313 4930 *y = tga_height; 3314 4931 if (comp) *comp = tga_comp; 3315 4932 3316 - tga_data = (unsigned char*)stbi__malloc( tga_width * tga_height * tga_comp ); 4933 + tga_data = (unsigned char*)stbi__malloc( (size_t)tga_width * tga_height * tga_comp ); 3317 4934 if (!tga_data) return stbi__errpuc("outofmem", "Out of memory"); 3318 4935 3319 4936 // skip to the data's starting position (offset usually = 0) ··· 3321 4938 3322 4939 if ( !tga_indexed && !tga_is_RLE) { 3323 4940 for (i=0; i < tga_height; ++i) { 3324 - int y = tga_inverted ? tga_height -i - 1 : i; 3325 - stbi_uc *tga_row = tga_data + y*tga_width*tga_comp; 4941 + int row = tga_inverted ? tga_height -i - 1 : i; 4942 + stbi_uc *tga_row = tga_data + row*tga_width*tga_comp; 3326 4943 stbi__getn(s, tga_row, tga_width * tga_comp); 3327 4944 } 3328 4945 } else { ··· 3334 4951 // load the palette 3335 4952 tga_palette = (unsigned char*)stbi__malloc( tga_palette_len * tga_palette_bits / 8 ); 3336 4953 if (!tga_palette) { 3337 - free(tga_data); 4954 + STBI_FREE(tga_data); 3338 4955 return stbi__errpuc("outofmem", "Out of memory"); 3339 4956 } 3340 4957 if (!stbi__getn(s, tga_palette, tga_palette_len * tga_palette_bits / 8 )) { 3341 - free(tga_data); 3342 - free(tga_palette); 4958 + STBI_FREE(tga_data); 4959 + STBI_FREE(tga_palette); 3343 4960 return stbi__errpuc("bad palette", "Corrupt TGA"); 3344 4961 } 3345 4962 } ··· 3421 5038 // clear my palette, if I had one 3422 5039 if ( tga_palette != NULL ) 3423 5040 { 3424 - free( tga_palette ); 5041 + STBI_FREE( tga_palette ); 3425 5042 } 3426 5043 } 3427 5044 ··· 3449 5066 // OK, done 3450 5067 return tga_data; 3451 5068 } 5069 + #endif 3452 5070 3453 5071 // ************************************************************************************************* 3454 5072 // Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB 3455 5073 5074 + #ifndef STBI_NO_PSD 3456 5075 static int stbi__psd_test(stbi__context *s) 3457 5076 { 3458 5077 int r = (stbi__get32be(s) == 0x38425053); ··· 3465 5084 int pixelCount; 3466 5085 int channelCount, compression; 3467 5086 int channel, i, count, len; 5087 + int bitdepth; 3468 5088 int w,h; 3469 5089 stbi_uc *out; 3470 5090 ··· 3487 5107 // Read the rows and columns of the image. 3488 5108 h = stbi__get32be(s); 3489 5109 w = stbi__get32be(s); 3490 - 5110 + 3491 5111 // Make sure the depth is 8 bits. 3492 - if (stbi__get16be(s) != 8) 3493 - return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 bit"); 5112 + bitdepth = stbi__get16be(s); 5113 + if (bitdepth != 8 && bitdepth != 16) 5114 + return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit"); 3494 5115 3495 5116 // Make sure the color mode is RGB. 3496 5117 // Valid options are: ··· 3529 5150 3530 5151 // Initialize the data to zero. 3531 5152 //memset( out, 0, pixelCount * 4 ); 3532 - 5153 + 3533 5154 // Finally, the image data. 3534 5155 if (compression) { 3535 5156 // RLE as used by .PSD and .TIFF ··· 3547 5168 // Read the RLE data by channel. 3548 5169 for (channel = 0; channel < 4; channel++) { 3549 5170 stbi_uc *p; 3550 - 5171 + 3551 5172 p = out+channel; 3552 5173 if (channel >= channelCount) { 3553 5174 // Fill this channel with default data. 3554 - for (i = 0; i < pixelCount; i++) *p = (channel == 3 ? 255 : 0), p += 4; 5175 + for (i = 0; i < pixelCount; i++, p += 4) 5176 + *p = (channel == 3 ? 255 : 0); 3555 5177 } else { 3556 5178 // Read the RLE data. 3557 5179 count = 0; ··· 3585 5207 } 3586 5208 } 3587 5209 } 3588 - 5210 + 3589 5211 } else { 3590 5212 // We're at the raw image data. It's each channel in order (Red, Green, Blue, Alpha, ...) 3591 5213 // where each channel consists of an 8-bit value for each pixel in the image. 3592 - 5214 + 3593 5215 // Read the data by channel. 3594 5216 for (channel = 0; channel < 4; channel++) { 3595 5217 stbi_uc *p; 3596 - 5218 + 3597 5219 p = out + channel; 3598 - if (channel > channelCount) { 5220 + if (channel >= channelCount) { 3599 5221 // Fill this channel with default data. 3600 - for (i = 0; i < pixelCount; i++) *p = channel == 3 ? 255 : 0, p += 4; 5222 + stbi_uc val = channel == 3 ? 255 : 0; 5223 + for (i = 0; i < pixelCount; i++, p += 4) 5224 + *p = val; 3601 5225 } else { 3602 5226 // Read the data. 3603 - for (i = 0; i < pixelCount; i++) 3604 - *p = stbi__get8(s), p += 4; 5227 + if (bitdepth == 16) { 5228 + for (i = 0; i < pixelCount; i++, p += 4) 5229 + *p = (stbi_uc) (stbi__get16be(s) >> 8); 5230 + } else { 5231 + for (i = 0; i < pixelCount; i++, p += 4) 5232 + *p = stbi__get8(s); 5233 + } 3605 5234 } 3606 5235 } 3607 5236 } ··· 3611 5240 if (out == NULL) return out; // stbi__convert_format frees input on failure 3612 5241 } 3613 5242 3614 - if (comp) *comp = channelCount; 5243 + if (comp) *comp = 4; 3615 5244 *y = h; 3616 5245 *x = w; 3617 - 5246 + 3618 5247 return out; 3619 5248 } 5249 + #endif 3620 5250 3621 5251 // ************************************************************************************************* 3622 5252 // Softimage PIC loader ··· 3625 5255 // See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format 3626 5256 // See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/ 3627 5257 5258 + #ifndef STBI_NO_PIC 3628 5259 static int stbi__pic_is4(stbi__context *s,const char *str) 3629 5260 { 3630 5261 int i; ··· 3757 5388 3758 5389 if (count >= 128) { // Repeated 3759 5390 stbi_uc value[4]; 3760 - int i; 3761 5391 3762 5392 if (count==128) 3763 5393 count = stbi__get16be(s); ··· 3812 5442 memset(result, 0xff, x*y*4); 3813 5443 3814 5444 if (!stbi__pic_load_core(s,x,y,comp, result)) { 3815 - free(result); 5445 + STBI_FREE(result); 3816 5446 result=0; 3817 5447 } 3818 5448 *px = x; ··· 3829 5459 stbi__rewind(s); 3830 5460 return r; 3831 5461 } 5462 + #endif 3832 5463 3833 5464 // ************************************************************************************************* 3834 5465 // GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb 3835 - typedef struct 5466 + 5467 + #ifndef STBI_NO_GIF 5468 + typedef struct 3836 5469 { 3837 5470 stbi__int16 prefix; 3838 5471 stbi_uc first; ··· 3842 5475 typedef struct 3843 5476 { 3844 5477 int w,h; 3845 - stbi_uc *out; // output buffer (always 4 components) 3846 - int flags, bgindex, ratio, transparent, eflags; 5478 + stbi_uc *out, *old_out; // output buffer (always 4 components) 5479 + int flags, bgindex, ratio, transparent, eflags, delay; 3847 5480 stbi_uc pal[256][4]; 3848 5481 stbi_uc lpal[256][4]; 3849 5482 stbi__gif_lzw codes[4096]; ··· 3880 5513 pal[i][2] = stbi__get8(s); 3881 5514 pal[i][1] = stbi__get8(s); 3882 5515 pal[i][0] = stbi__get8(s); 3883 - pal[i][3] = transp ? 0 : 255; 3884 - } 5516 + pal[i][3] = transp == i ? 0 : 255; 5517 + } 3885 5518 } 3886 5519 3887 5520 static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info) ··· 3892 5525 3893 5526 version = stbi__get8(s); 3894 5527 if (version != '7' && version != '9') return stbi__err("not GIF", "Corrupt GIF"); 3895 - if (stbi__get8(s) != 'a') return stbi__err("not GIF", "Corrupt GIF"); 3896 - 5528 + if (stbi__get8(s) != 'a') return stbi__err("not GIF", "Corrupt GIF"); 5529 + 3897 5530 stbi__g_failure_reason = ""; 3898 5531 g->w = stbi__get16le(s); 3899 5532 g->h = stbi__get16le(s); ··· 3914 5547 3915 5548 static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp) 3916 5549 { 3917 - stbi__gif g; 5550 + stbi__gif g; 3918 5551 if (!stbi__gif_header(s, &g, comp, 1)) { 3919 5552 stbi__rewind( s ); 3920 5553 return 0; ··· 3934 5567 stbi__out_gif_code(g, g->codes[code].prefix); 3935 5568 3936 5569 if (g->cur_y >= g->max_y) return; 3937 - 5570 + 3938 5571 p = &g->out[g->cur_x + g->cur_y]; 3939 5572 c = &g->color_table[g->codes[code].suffix * 4]; 3940 5573 ··· 3961 5594 static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g) 3962 5595 { 3963 5596 stbi_uc lzw_cs; 3964 - stbi__int32 len, code; 5597 + stbi__int32 len, init_code; 3965 5598 stbi__uint32 first; 3966 5599 stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear; 3967 5600 stbi__gif_lzw *p; 3968 5601 3969 5602 lzw_cs = stbi__get8(s); 5603 + if (lzw_cs > 12) return NULL; 3970 5604 clear = 1 << lzw_cs; 3971 5605 first = 1; 3972 5606 codesize = lzw_cs + 1; 3973 5607 codemask = (1 << codesize) - 1; 3974 5608 bits = 0; 3975 5609 valid_bits = 0; 3976 - for (code = 0; code < clear; code++) { 3977 - g->codes[code].prefix = -1; 3978 - g->codes[code].first = (stbi_uc) code; 3979 - g->codes[code].suffix = (stbi_uc) code; 5610 + for (init_code = 0; init_code < clear; init_code++) { 5611 + g->codes[init_code].prefix = -1; 5612 + g->codes[init_code].first = (stbi_uc) init_code; 5613 + g->codes[init_code].suffix = (stbi_uc) init_code; 3980 5614 } 3981 5615 3982 5616 // support no starting clear code ··· 3988 5622 if (valid_bits < codesize) { 3989 5623 if (len == 0) { 3990 5624 len = stbi__get8(s); // start new block 3991 - if (len == 0) 5625 + if (len == 0) 3992 5626 return g->out; 3993 5627 } 3994 5628 --len; ··· 4033 5667 } else { 4034 5668 return stbi__errpuc("illegal code in raster", "Corrupt GIF"); 4035 5669 } 4036 - } 5670 + } 4037 5671 } 4038 5672 } 4039 5673 4040 - static void stbi__fill_gif_background(stbi__gif *g) 5674 + static void stbi__fill_gif_background(stbi__gif *g, int x0, int y0, int x1, int y1) 4041 5675 { 4042 - int i; 5676 + int x, y; 4043 5677 stbi_uc *c = g->pal[g->bgindex]; 4044 - // @OPTIMIZE: write a dword at a time 4045 - for (i = 0; i < g->w * g->h * 4; i += 4) { 4046 - stbi_uc *p = &g->out[i]; 4047 - p[0] = c[2]; 4048 - p[1] = c[1]; 4049 - p[2] = c[0]; 4050 - p[3] = c[3]; 5678 + for (y = y0; y < y1; y += 4 * g->w) { 5679 + for (x = x0; x < x1; x += 4) { 5680 + stbi_uc *p = &g->out[y + x]; 5681 + p[0] = c[2]; 5682 + p[1] = c[1]; 5683 + p[2] = c[0]; 5684 + p[3] = 0; 5685 + } 4051 5686 } 4052 5687 } 4053 5688 ··· 4055 5690 static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp) 4056 5691 { 4057 5692 int i; 4058 - stbi_uc *old_out = 0; 5693 + stbi_uc *prev_out = 0; 4059 5694 4060 - if (g->out == 0) { 4061 - if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header 4062 - g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h); 4063 - if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory"); 4064 - stbi__fill_gif_background(g); 4065 - } else { 4066 - // animated-gif-only path 4067 - if (((g->eflags & 0x1C) >> 2) == 3) { 4068 - old_out = g->out; 4069 - g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h); 4070 - if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory"); 4071 - memcpy(g->out, old_out, g->w*g->h*4); 4072 - } 5695 + if (g->out == 0 && !stbi__gif_header(s, g, comp,0)) 5696 + return 0; // stbi__g_failure_reason set by stbi__gif_header 5697 + 5698 + prev_out = g->out; 5699 + g->out = (stbi_uc *) stbi__malloc(4 * g->w * g->h); 5700 + if (g->out == 0) return stbi__errpuc("outofmem", "Out of memory"); 5701 + 5702 + switch ((g->eflags & 0x1C) >> 2) { 5703 + case 0: // unspecified (also always used on 1st frame) 5704 + stbi__fill_gif_background(g, 0, 0, 4 * g->w, 4 * g->w * g->h); 5705 + break; 5706 + case 1: // do not dispose 5707 + if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h); 5708 + g->old_out = prev_out; 5709 + break; 5710 + case 2: // dispose to background 5711 + if (prev_out) memcpy(g->out, prev_out, 4 * g->w * g->h); 5712 + stbi__fill_gif_background(g, g->start_x, g->start_y, g->max_x, g->max_y); 5713 + break; 5714 + case 3: // dispose to previous 5715 + if (g->old_out) { 5716 + for (i = g->start_y; i < g->max_y; i += 4 * g->w) 5717 + memcpy(&g->out[i + g->start_x], &g->old_out[i + g->start_x], g->max_x - g->start_x); 5718 + } 5719 + break; 4073 5720 } 4074 - 5721 + 4075 5722 for (;;) { 4076 5723 switch (stbi__get8(s)) { 4077 5724 case 0x2C: /* Image Descriptor */ 4078 5725 { 5726 + int prev_trans = -1; 4079 5727 stbi__int32 x, y, w, h; 4080 5728 stbi_uc *o; 4081 5729 ··· 4106 5754 4107 5755 if (g->lflags & 0x80) { 4108 5756 stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1); 4109 - g->color_table = (stbi_uc *) g->lpal; 5757 + g->color_table = (stbi_uc *) g->lpal; 4110 5758 } else if (g->flags & 0x80) { 4111 - for (i=0; i < 256; ++i) // @OPTIMIZE: stbi__jpeg_reset only the previous transparent 4112 - g->pal[i][3] = 255; 4113 - if (g->transparent >= 0 && (g->eflags & 0x01)) 5759 + if (g->transparent >= 0 && (g->eflags & 0x01)) { 5760 + prev_trans = g->pal[g->transparent][3]; 4114 5761 g->pal[g->transparent][3] = 0; 5762 + } 4115 5763 g->color_table = (stbi_uc *) g->pal; 4116 5764 } else 4117 5765 return stbi__errpuc("missing color table", "Corrupt GIF"); 4118 - 5766 + 4119 5767 o = stbi__process_gif_raster(s, g); 4120 5768 if (o == NULL) return NULL; 4121 5769 4122 - if (req_comp && req_comp != 4) 4123 - o = stbi__convert_format(o, 4, req_comp, g->w, g->h); 5770 + if (prev_trans != -1) 5771 + g->pal[g->transparent][3] = (stbi_uc) prev_trans; 5772 + 4124 5773 return o; 4125 5774 } 4126 5775 ··· 4131 5780 len = stbi__get8(s); 4132 5781 if (len == 4) { 4133 5782 g->eflags = stbi__get8(s); 4134 - stbi__get16le(s); // delay 5783 + g->delay = stbi__get16le(s); 4135 5784 g->transparent = stbi__get8(s); 4136 5785 } else { 4137 5786 stbi__skip(s, len); ··· 4150 5799 return stbi__errpuc("unknown code", "Corrupt GIF"); 4151 5800 } 4152 5801 } 5802 + 5803 + STBI_NOTUSED(req_comp); 4153 5804 } 4154 5805 4155 5806 static stbi_uc *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) ··· 4163 5814 if (u) { 4164 5815 *x = g.w; 4165 5816 *y = g.h; 5817 + if (req_comp && req_comp != 4) 5818 + u = stbi__convert_format(u, 4, req_comp, g.w, g.h); 4166 5819 } 5820 + else if (g.out) 5821 + STBI_FREE(g.out); 4167 5822 4168 5823 return u; 4169 5824 } ··· 4172 5827 { 4173 5828 return stbi__gif_info_raw(s,x,y,comp); 4174 5829 } 4175 - 5830 + #endif 4176 5831 4177 5832 // ************************************************************************************************* 4178 5833 // Radiance RGBE HDR loader ··· 4261 5916 // Check identifier 4262 5917 if (strcmp(stbi__hdr_gettoken(s,buffer), "#?RADIANCE") != 0) 4263 5918 return stbi__errpf("not HDR", "Corrupt HDR image"); 4264 - 5919 + 4265 5920 // Parse header 4266 5921 for(;;) { 4267 5922 token = stbi__hdr_gettoken(s,buffer); ··· 4322 5977 stbi__hdr_convert(hdr_data, rgbe, req_comp); 4323 5978 i = 1; 4324 5979 j = 0; 4325 - free(scanline); 5980 + STBI_FREE(scanline); 4326 5981 goto main_decode_loop; // yes, this makes no sense 4327 5982 } 4328 5983 len <<= 8; 4329 5984 len |= stbi__get8(s); 4330 - if (len != width) { free(hdr_data); free(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); } 5985 + if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); } 4331 5986 if (scanline == NULL) scanline = (stbi_uc *) stbi__malloc(width * 4); 4332 - 5987 + 4333 5988 for (k = 0; k < 4; ++k) { 4334 5989 i = 0; 4335 5990 while (i < width) { ··· 4350 6005 for (i=0; i < width; ++i) 4351 6006 stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp); 4352 6007 } 4353 - free(scanline); 6008 + STBI_FREE(scanline); 4354 6009 } 4355 6010 4356 6011 return hdr_data; ··· 4396 6051 } 4397 6052 #endif // STBI_NO_HDR 4398 6053 6054 + #ifndef STBI_NO_BMP 4399 6055 static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp) 4400 6056 { 4401 6057 int hsz; ··· 4423 6079 *comp = stbi__get16le(s) / 8; 4424 6080 return 1; 4425 6081 } 6082 + #endif 4426 6083 6084 + #ifndef STBI_NO_PSD 4427 6085 static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp) 4428 6086 { 4429 6087 int channelCount; ··· 4454 6112 *comp = 4; 4455 6113 return 1; 4456 6114 } 6115 + #endif 4457 6116 6117 + #ifndef STBI_NO_PIC 4458 6118 static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp) 4459 6119 { 4460 6120 int act_comp=0,num_packets=0,chained; 4461 6121 stbi__pic_packet packets[10]; 4462 6122 4463 - stbi__skip(s, 92); 6123 + if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) { 6124 + stbi__rewind(s); 6125 + return 0; 6126 + } 6127 + 6128 + stbi__skip(s, 88); 4464 6129 4465 6130 *x = stbi__get16be(s); 4466 6131 *y = stbi__get16be(s); 4467 - if (stbi__at_eof(s)) return 0; 6132 + if (stbi__at_eof(s)) { 6133 + stbi__rewind( s); 6134 + return 0; 6135 + } 4468 6136 if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) { 4469 - stbi__rewind( s ); 4470 - return 0; 6137 + stbi__rewind( s ); 6138 + return 0; 4471 6139 } 4472 6140 4473 6141 stbi__skip(s, 8); ··· 4499 6167 4500 6168 return 1; 4501 6169 } 6170 + #endif 6171 + 6172 + // ************************************************************************************************* 6173 + // Portable Gray Map and Portable Pixel Map loader 6174 + // by Ken Miller 6175 + // 6176 + // PGM: http://netpbm.sourceforge.net/doc/pgm.html 6177 + // PPM: http://netpbm.sourceforge.net/doc/ppm.html 6178 + // 6179 + // Known limitations: 6180 + // Does not support comments in the header section 6181 + // Does not support ASCII image data (formats P2 and P3) 6182 + // Does not support 16-bit-per-channel 6183 + 6184 + #ifndef STBI_NO_PNM 6185 + 6186 + static int stbi__pnm_test(stbi__context *s) 6187 + { 6188 + char p, t; 6189 + p = (char) stbi__get8(s); 6190 + t = (char) stbi__get8(s); 6191 + if (p != 'P' || (t != '5' && t != '6')) { 6192 + stbi__rewind( s ); 6193 + return 0; 6194 + } 6195 + return 1; 6196 + } 6197 + 6198 + static stbi_uc *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp) 6199 + { 6200 + stbi_uc *out; 6201 + if (!stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n)) 6202 + return 0; 6203 + *x = s->img_x; 6204 + *y = s->img_y; 6205 + *comp = s->img_n; 6206 + 6207 + out = (stbi_uc *) stbi__malloc(s->img_n * s->img_x * s->img_y); 6208 + if (!out) return stbi__errpuc("outofmem", "Out of memory"); 6209 + stbi__getn(s, out, s->img_n * s->img_x * s->img_y); 6210 + 6211 + if (req_comp && req_comp != s->img_n) { 6212 + out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y); 6213 + if (out == NULL) return out; // stbi__convert_format frees input on failure 6214 + } 6215 + return out; 6216 + } 6217 + 6218 + static int stbi__pnm_isspace(char c) 6219 + { 6220 + return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r'; 6221 + } 6222 + 6223 + static void stbi__pnm_skip_whitespace(stbi__context *s, char *c) 6224 + { 6225 + while (!stbi__at_eof(s) && stbi__pnm_isspace(*c)) 6226 + *c = (char) stbi__get8(s); 6227 + } 6228 + 6229 + static int stbi__pnm_isdigit(char c) 6230 + { 6231 + return c >= '0' && c <= '9'; 6232 + } 6233 + 6234 + static int stbi__pnm_getinteger(stbi__context *s, char *c) 6235 + { 6236 + int value = 0; 6237 + 6238 + while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) { 6239 + value = value*10 + (*c - '0'); 6240 + *c = (char) stbi__get8(s); 6241 + } 6242 + 6243 + return value; 6244 + } 6245 + 6246 + static int stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp) 6247 + { 6248 + int maxv; 6249 + char c, p, t; 6250 + 6251 + stbi__rewind( s ); 6252 + 6253 + // Get identifier 6254 + p = (char) stbi__get8(s); 6255 + t = (char) stbi__get8(s); 6256 + if (p != 'P' || (t != '5' && t != '6')) { 6257 + stbi__rewind( s ); 6258 + return 0; 6259 + } 6260 + 6261 + *comp = (t == '6') ? 3 : 1; // '5' is 1-component .pgm; '6' is 3-component .ppm 6262 + 6263 + c = (char) stbi__get8(s); 6264 + stbi__pnm_skip_whitespace(s, &c); 6265 + 6266 + *x = stbi__pnm_getinteger(s, &c); // read width 6267 + stbi__pnm_skip_whitespace(s, &c); 6268 + 6269 + *y = stbi__pnm_getinteger(s, &c); // read height 6270 + stbi__pnm_skip_whitespace(s, &c); 6271 + 6272 + maxv = stbi__pnm_getinteger(s, &c); // read max value 6273 + 6274 + if (maxv > 255) 6275 + return stbi__err("max value > 255", "PPM image not 8-bit"); 6276 + else 6277 + return 1; 6278 + } 6279 + #endif 4502 6280 4503 6281 static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp) 4504 6282 { 4505 - if (stbi__jpeg_info(s, x, y, comp)) 4506 - return 1; 4507 - if (stbi__png_info(s, x, y, comp)) 4508 - return 1; 4509 - if (stbi__gif_info(s, x, y, comp)) 4510 - return 1; 4511 - if (stbi__bmp_info(s, x, y, comp)) 4512 - return 1; 4513 - if (stbi__psd_info(s, x, y, comp)) 4514 - return 1; 4515 - if (stbi__pic_info(s, x, y, comp)) 4516 - return 1; 6283 + #ifndef STBI_NO_JPEG 6284 + if (stbi__jpeg_info(s, x, y, comp)) return 1; 6285 + #endif 6286 + 6287 + #ifndef STBI_NO_PNG 6288 + if (stbi__png_info(s, x, y, comp)) return 1; 6289 + #endif 6290 + 6291 + #ifndef STBI_NO_GIF 6292 + if (stbi__gif_info(s, x, y, comp)) return 1; 6293 + #endif 6294 + 6295 + #ifndef STBI_NO_BMP 6296 + if (stbi__bmp_info(s, x, y, comp)) return 1; 6297 + #endif 6298 + 6299 + #ifndef STBI_NO_PSD 6300 + if (stbi__psd_info(s, x, y, comp)) return 1; 6301 + #endif 6302 + 6303 + #ifndef STBI_NO_PIC 6304 + if (stbi__pic_info(s, x, y, comp)) return 1; 6305 + #endif 6306 + 6307 + #ifndef STBI_NO_PNM 6308 + if (stbi__pnm_info(s, x, y, comp)) return 1; 6309 + #endif 6310 + 4517 6311 #ifndef STBI_NO_HDR 4518 - if (stbi__hdr_info(s, x, y, comp)) 4519 - return 1; 6312 + if (stbi__hdr_info(s, x, y, comp)) return 1; 4520 6313 #endif 6314 + 4521 6315 // test tga last because it's a crappy test! 6316 + #ifndef STBI_NO_TGA 4522 6317 if (stbi__tga_info(s, x, y, comp)) 4523 6318 return 1; 6319 + #endif 4524 6320 return stbi__err("unknown image type", "Image not of any known type, or corrupt"); 4525 6321 } 4526 6322 ··· 4565 6361 4566 6362 /* 4567 6363 revision history: 4568 - 1.46 (2014-08-26) 4569 - fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG 4570 - 1.45 (2014-08-16) 4571 - fix MSVC-ARM internal compiler error by wrapping malloc 4572 - 1.44 (2014-08-07) 4573 - various warning fixes from Ronny Chevalier 4574 - 1.43 (2014-07-15) 4575 - fix MSVC-only compiler problem in code changed in 1.42 4576 - 1.42 (2014-07-09) 4577 - don't define _CRT_SECURE_NO_WARNINGS (affects user code) 4578 - fixes to stbi__cleanup_jpeg path 4579 - added STBI_ASSERT to avoid requiring assert.h 4580 - 1.41 (2014-06-25) 4581 - fix search&replace from 1.36 that messed up comments/error messages 4582 - 1.40 (2014-06-22) 4583 - fix gcc struct-initialization warning 4584 - 1.39 (2014-06-15) 4585 - fix to TGA optimization when req_comp != number of components in TGA; 4586 - fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite) 4587 - add support for BMP version 5 (more ignored fields) 4588 - 1.38 (2014-06-06) 4589 - suppress MSVC warnings on integer casts truncating values 4590 - fix accidental rename of 'skip' field of I/O 4591 - 1.37 (2014-06-04) 4592 - remove duplicate typedef 4593 - 1.36 (2014-06-03) 4594 - convert to header file single-file library 4595 - if de-iphone isn't set, load iphone images color-swapped instead of returning NULL 4596 - 1.35 (2014-05-27) 4597 - various warnings 4598 - fix broken STBI_SIMD path 4599 - fix bug where stbi_load_from_file no longer left file pointer in correct place 4600 - fix broken non-easy path for 32-bit BMP (possibly never used) 4601 - TGA optimization by Arseny Kapoulkine 4602 - 1.34 (unknown) 4603 - use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case 4604 - 1.33 (2011-07-14) 4605 - make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements 4606 - 1.32 (2011-07-13) 4607 - support for "info" function for all supported filetypes (SpartanJ) 4608 - 1.31 (2011-06-20) 4609 - a few more leak fixes, bug in PNG handling (SpartanJ) 4610 - 1.30 (2011-06-11) 4611 - added ability to load files via callbacks to accomidate custom input streams (Ben Wenger) 4612 - removed deprecated format-specific test/load functions 4613 - removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway 4614 - error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha) 4615 - fix inefficiency in decoding 32-bit BMP (David Woo) 4616 - 1.29 (2010-08-16) 4617 - various warning fixes from Aurelien Pocheville 4618 - 1.28 (2010-08-01) 4619 - fix bug in GIF palette transparency (SpartanJ) 4620 - 1.27 (2010-08-01) 4621 - cast-to-stbi_uc to fix warnings 4622 - 1.26 (2010-07-24) 4623 - fix bug in file buffering for PNG reported by SpartanJ 4624 - 1.25 (2010-07-17) 4625 - refix trans_data warning (Won Chun) 4626 - 1.24 (2010-07-12) 4627 - perf improvements reading from files on platforms with lock-heavy fgetc() 4628 - minor perf improvements for jpeg 4629 - deprecated type-specific functions so we'll get feedback if they're needed 4630 - attempt to fix trans_data warning (Won Chun) 4631 - 1.23 fixed bug in iPhone support 4632 - 1.22 (2010-07-10) 4633 - removed image *writing* support 4634 - stbi_info support from Jetro Lauha 4635 - GIF support from Jean-Marc Lienher 4636 - iPhone PNG-extensions from James Brown 4637 - warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva) 4638 - 1.21 fix use of 'stbi_uc' in header (reported by jon blow) 4639 - 1.20 added support for Softimage PIC, by Tom Seddon 4640 - 1.19 bug in interlaced PNG corruption check (found by ryg) 4641 - 1.18 2008-08-02 4642 - fix a threading bug (local mutable static) 4643 - 1.17 support interlaced PNG 4644 - 1.16 major bugfix - stbi__convert_format converted one too many pixels 4645 - 1.15 initialize some fields for thread safety 4646 - 1.14 fix threadsafe conversion bug 4647 - header-file-only version (#define STBI_HEADER_FILE_ONLY before including) 4648 - 1.13 threadsafe 4649 - 1.12 const qualifiers in the API 4650 - 1.11 Support installable IDCT, colorspace conversion routines 4651 - 1.10 Fixes for 64-bit (don't use "unsigned long") 4652 - optimized upsampling by Fabian "ryg" Giesen 4653 - 1.09 Fix format-conversion for PSD code (bad global variables!) 4654 - 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz 4655 - 1.07 attempt to fix C++ warning/errors again 4656 - 1.06 attempt to fix C++ warning/errors again 4657 - 1.05 fix TGA loading to return correct *comp and use good luminance calc 4658 - 1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free 4659 - 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR 4660 - 1.02 support for (subset of) HDR files, float interface for preferred access to them 4661 - 1.01 fix bug: possible bug in handling right-side up bmps... not sure 4662 - fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all 4663 - 1.00 interface to zlib that skips zlib header 4664 - 0.99 correct handling of alpha in palette 4665 - 0.98 TGA loader by lonesock; dynamically add loaders (untested) 4666 - 0.97 jpeg errors on too large a file; also catch another malloc failure 4667 - 0.96 fix detection of invalid v value - particleman@mollyrocket forum 4668 - 0.95 during header scan, seek to markers in case of padding 4669 - 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same 4670 - 0.93 handle jpegtran output; verbose errors 4671 - 0.92 read 4,8,16,24,32-bit BMP files of several formats 4672 - 0.91 output 24-bit Windows 3.0 BMP files 4673 - 0.90 fix a few more warnings; bump version number to approach 1.0 4674 - 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd 4675 - 0.60 fix compiling as c++ 4676 - 0.59 fix warnings: merge Dave Moore's -Wall fixes 4677 - 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian 4678 - 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available 4679 - 0.56 fix bug: zlib uncompressed mode len vs. nlen 4680 - 0.55 fix bug: restart_interval not initialized to 0 4681 - 0.54 allow NULL for 'int *comp' 4682 - 0.53 fix bug in png 3->4; speedup png decoding 4683 - 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments 4684 - 0.51 obey req_comp requests, 1-component jpegs return as 1-component, 4685 - on 'test' only check type, not whether we support this variant 4686 - 0.50 first released version 6364 + 2.08 (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA 6365 + 2.07 (2015-09-13) fix compiler warnings 6366 + partial animated GIF support 6367 + limited 16-bit PSD support 6368 + #ifdef unused functions 6369 + bug with < 92 byte PIC,PNM,HDR,TGA 6370 + 2.06 (2015-04-19) fix bug where PSD returns wrong '*comp' value 6371 + 2.05 (2015-04-19) fix bug in progressive JPEG handling, fix warning 6372 + 2.04 (2015-04-15) try to re-enable SIMD on MinGW 64-bit 6373 + 2.03 (2015-04-12) extra corruption checking (mmozeiko) 6374 + stbi_set_flip_vertically_on_load (nguillemot) 6375 + fix NEON support; fix mingw support 6376 + 2.02 (2015-01-19) fix incorrect assert, fix warning 6377 + 2.01 (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2 6378 + 2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG 6379 + 2.00 (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg) 6380 + progressive JPEG (stb) 6381 + PGM/PPM support (Ken Miller) 6382 + STBI_MALLOC,STBI_REALLOC,STBI_FREE 6383 + GIF bugfix -- seemingly never worked 6384 + STBI_NO_*, STBI_ONLY_* 6385 + 1.48 (2014-12-14) fix incorrectly-named assert() 6386 + 1.47 (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb) 6387 + optimize PNG (ryg) 6388 + fix bug in interlaced PNG with user-specified channel count (stb) 6389 + 1.46 (2014-08-26) 6390 + fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG 6391 + 1.45 (2014-08-16) 6392 + fix MSVC-ARM internal compiler error by wrapping malloc 6393 + 1.44 (2014-08-07) 6394 + various warning fixes from Ronny Chevalier 6395 + 1.43 (2014-07-15) 6396 + fix MSVC-only compiler problem in code changed in 1.42 6397 + 1.42 (2014-07-09) 6398 + don't define _CRT_SECURE_NO_WARNINGS (affects user code) 6399 + fixes to stbi__cleanup_jpeg path 6400 + added STBI_ASSERT to avoid requiring assert.h 6401 + 1.41 (2014-06-25) 6402 + fix search&replace from 1.36 that messed up comments/error messages 6403 + 1.40 (2014-06-22) 6404 + fix gcc struct-initialization warning 6405 + 1.39 (2014-06-15) 6406 + fix to TGA optimization when req_comp != number of components in TGA; 6407 + fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite) 6408 + add support for BMP version 5 (more ignored fields) 6409 + 1.38 (2014-06-06) 6410 + suppress MSVC warnings on integer casts truncating values 6411 + fix accidental rename of 'skip' field of I/O 6412 + 1.37 (2014-06-04) 6413 + remove duplicate typedef 6414 + 1.36 (2014-06-03) 6415 + convert to header file single-file library 6416 + if de-iphone isn't set, load iphone images color-swapped instead of returning NULL 6417 + 1.35 (2014-05-27) 6418 + various warnings 6419 + fix broken STBI_SIMD path 6420 + fix bug where stbi_load_from_file no longer left file pointer in correct place 6421 + fix broken non-easy path for 32-bit BMP (possibly never used) 6422 + TGA optimization by Arseny Kapoulkine 6423 + 1.34 (unknown) 6424 + use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case 6425 + 1.33 (2011-07-14) 6426 + make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements 6427 + 1.32 (2011-07-13) 6428 + support for "info" function for all supported filetypes (SpartanJ) 6429 + 1.31 (2011-06-20) 6430 + a few more leak fixes, bug in PNG handling (SpartanJ) 6431 + 1.30 (2011-06-11) 6432 + added ability to load files via callbacks to accomidate custom input streams (Ben Wenger) 6433 + removed deprecated format-specific test/load functions 6434 + removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway 6435 + error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha) 6436 + fix inefficiency in decoding 32-bit BMP (David Woo) 6437 + 1.29 (2010-08-16) 6438 + various warning fixes from Aurelien Pocheville 6439 + 1.28 (2010-08-01) 6440 + fix bug in GIF palette transparency (SpartanJ) 6441 + 1.27 (2010-08-01) 6442 + cast-to-stbi_uc to fix warnings 6443 + 1.26 (2010-07-24) 6444 + fix bug in file buffering for PNG reported by SpartanJ 6445 + 1.25 (2010-07-17) 6446 + refix trans_data warning (Won Chun) 6447 + 1.24 (2010-07-12) 6448 + perf improvements reading from files on platforms with lock-heavy fgetc() 6449 + minor perf improvements for jpeg 6450 + deprecated type-specific functions so we'll get feedback if they're needed 6451 + attempt to fix trans_data warning (Won Chun) 6452 + 1.23 fixed bug in iPhone support 6453 + 1.22 (2010-07-10) 6454 + removed image *writing* support 6455 + stbi_info support from Jetro Lauha 6456 + GIF support from Jean-Marc Lienher 6457 + iPhone PNG-extensions from James Brown 6458 + warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva) 6459 + 1.21 fix use of 'stbi_uc' in header (reported by jon blow) 6460 + 1.20 added support for Softimage PIC, by Tom Seddon 6461 + 1.19 bug in interlaced PNG corruption check (found by ryg) 6462 + 1.18 (2008-08-02) 6463 + fix a threading bug (local mutable static) 6464 + 1.17 support interlaced PNG 6465 + 1.16 major bugfix - stbi__convert_format converted one too many pixels 6466 + 1.15 initialize some fields for thread safety 6467 + 1.14 fix threadsafe conversion bug 6468 + header-file-only version (#define STBI_HEADER_FILE_ONLY before including) 6469 + 1.13 threadsafe 6470 + 1.12 const qualifiers in the API 6471 + 1.11 Support installable IDCT, colorspace conversion routines 6472 + 1.10 Fixes for 64-bit (don't use "unsigned long") 6473 + optimized upsampling by Fabian "ryg" Giesen 6474 + 1.09 Fix format-conversion for PSD code (bad global variables!) 6475 + 1.08 Thatcher Ulrich's PSD code integrated by Nicolas Schulz 6476 + 1.07 attempt to fix C++ warning/errors again 6477 + 1.06 attempt to fix C++ warning/errors again 6478 + 1.05 fix TGA loading to return correct *comp and use good luminance calc 6479 + 1.04 default float alpha is 1, not 255; use 'void *' for stbi_image_free 6480 + 1.03 bugfixes to STBI_NO_STDIO, STBI_NO_HDR 6481 + 1.02 support for (subset of) HDR files, float interface for preferred access to them 6482 + 1.01 fix bug: possible bug in handling right-side up bmps... not sure 6483 + fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all 6484 + 1.00 interface to zlib that skips zlib header 6485 + 0.99 correct handling of alpha in palette 6486 + 0.98 TGA loader by lonesock; dynamically add loaders (untested) 6487 + 0.97 jpeg errors on too large a file; also catch another malloc failure 6488 + 0.96 fix detection of invalid v value - particleman@mollyrocket forum 6489 + 0.95 during header scan, seek to markers in case of padding 6490 + 0.94 STBI_NO_STDIO to disable stdio usage; rename all #defines the same 6491 + 0.93 handle jpegtran output; verbose errors 6492 + 0.92 read 4,8,16,24,32-bit BMP files of several formats 6493 + 0.91 output 24-bit Windows 3.0 BMP files 6494 + 0.90 fix a few more warnings; bump version number to approach 1.0 6495 + 0.61 bugfixes due to Marc LeBlanc, Christopher Lloyd 6496 + 0.60 fix compiling as c++ 6497 + 0.59 fix warnings: merge Dave Moore's -Wall fixes 6498 + 0.58 fix bug: zlib uncompressed mode len/nlen was wrong endian 6499 + 0.57 fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available 6500 + 0.56 fix bug: zlib uncompressed mode len vs. nlen 6501 + 0.55 fix bug: restart_interval not initialized to 0 6502 + 0.54 allow NULL for 'int *comp' 6503 + 0.53 fix bug in png 3->4; speedup png decoding 6504 + 0.52 png handles req_comp=3,4 directly; minor cleanup; jpeg comments 6505 + 0.51 obey req_comp requests, 1-component jpegs return as 1-component, 6506 + on 'test' only check type, not whether we support this variant 6507 + 0.50 (2006-11-19) 6508 + first released version 4687 6509 */