diff --git a/.gitignore b/.gitignore
index 258b4d5..f174fc5 100644
--- a/.gitignore
+++ b/.gitignore
@@ -11,4 +11,6 @@
 /neo/.vs*
 /neo/out*
 /neo/build*
+/.vs*
 
+/neo/CMakeSettings.json
diff --git a/Changelog.md b/Changelog.md
new file mode 100644
index 0000000..e52c1b4
--- /dev/null
+++ b/Changelog.md
@@ -0,0 +1,205 @@
+dhewm3 Changelog
+=================
+
+Note: Numbers starting with a "#" like #330 refer to the bugreport with that number
+      at https://github.com/dhewm/dhewm3/issues/
+
+1.5.2 (WIP)
+------------------------------------------------------------------------
+
+* Gamma and Brightness are now applied in the shaders instead of by setting hardware gamma.  
+  Can be disabled (so hardware gamma is used again) with `r_gammaInShaders 0`
+* Improvements for (Windows-only) MFC-based tools:
+    - Added the script debugger! (thanks *HarrievG*!)  
+      Original Doom3 didn't have it (Quake4 did), but the Doom3 GPL source contained
+      most of it. *HarrievG* implemented the missing parts and we added some new
+      features. It can even be used over the network and while the client part
+      (the debugger GUI) is Windows-only, the server can even run on all supported
+      platforms, so you can even debug a game running on Linux or macOS, for example.  
+      Relevant CVars for network debugging are:
+      `com_enableDebuggerServer` and `com_dbgClientAdr` and `com_dbgServerAdr`.  
+      To debug the running game on the same PC, just enter `debugger` in the console.
+    - All tools can now be built in 64bit (thanks *raynorpat*!)
+    - HighDPI support (thanks *HarrievG*!)
+    - PDAEditor works now
+    - Additional bugfixes
+* Cycle through multiple Quicksave slots instead of immediately overwriting the last
+  Quicksave. The `com_numQuicksaves` CVar allows setting the number of QuickSaves (#392)
+* Make r_locksurfaces work (#357)  
+  It doesn't do exactly what its description and name suggests: it renders
+  everything that is *currently* visible from the position/view the player had
+  when setting `r_locksurfaces 1`. Originally it was supposed to render exactly
+  the surfaces that *were* visible then, but I couldn't get that to work.  
+  This is pretty similar, but there may be differences with opened doors and such.
+* Keyboard input improvements (mostly SDL2-only):
+    - Support (hopefully) all keyboard keys on all kinds of keyboard layouts
+      by using scancodes for otherwise unknown keys
+    - Support typing in non-ASCII characters, if supported by Doom3 (it supports ISO-8859-1)
+    - Support the clipboard also on non-Windows platforms  
+      You can paste code from the clipboard into the console or other edit fields
+      with `Shift+Insert`
+    - Explicit support for Right Ctrl, Alt and Shift keys  
+      (can be bound to different actions than their left counterparts)
+    - Added `in_grabKeyboard` CVar to make sure dhewm3 gets *all* keyboard input  
+      Prevents the Windows-key or Alt-Tab or whatever from taking focus from the game
+    - Added `in_ignoreConsoleKey` - if set to `1`, the console is only opened with
+      Shift+Esc, and the "console key" (that key between Esc, 1 and Tab) can be freely
+      bound to an action (and its char can be typed in the console without closing it).
+    - Added (SDL2-only) "auto" option for `in_kbd`: When not disabling the console key,
+      dhewm3 will try to automatically detect it if `in_kbd` is set to "auto" (now default)
+* `s_alReverbGain` CVar to reduce EFX reverb effect intensity (#365)
+* Pause (looped) sounds when entering menu (#330)
+* Fixes for looped sounds (#390)
+* Replace libjpeg with stb_image and libogg/libvorbis(file) with stb_vorbis
+    - Now the only required external dependencies should be OpenAL, SDL, zlib
+      and optionally libCURL (and of course the C and C++ runtimes)
+* (Optionally) use libbacktrace on non-Windows platforms for more useful
+  backtraces in case of crashes (usually linked statically)
+* Fixed a deadlock (freeze) on Windows when printing messages from another thread
+* Fixed endless loop (game locking up at startup) if graphics settings couldn't be applied (#386)
+* Fixed some warnings and uninitialized variables (thanks *turol*!)
+* Work around dmap bug caused by GCC using FMA "optimizations" (#147)
+* Prevent dhewm3 from being run as root on Unix-like systems to improve security
+* Replaced most usages of `strncpy()` with something safer to prevent buffer overflows
+  (remaining cases should be safe).
+    - Just a precaution, I don't know if any of them could actually be exploited,
+      but there were some compiler warnings in newer GCC versions.
+* Console output is now logged to `dhewm3log.txt` (last log is renamed to `dhewm3log-old.txt`)
+    - On Windows it's in `My Documents/My Games/dhewm3/`
+    - On Mac it's in `$HOME/Library/Application Support/dhewm3/`
+    - On other Unix-like systems like Linux it's in `$XDG_DATA_HOME/dhewm3/`
+      (usually `$HOME/.local/share/dhewm3/`)
+
+
+1.5.1 (2021-03-14)
+------------------------------------------------------------------------
+
+* The (Windows-only) integrated **editing tools** of Doom3 are back!
+    - They can only be built with non-Express versions of Visual Studio (tested
+      Community Editions of VS2013 and VS2017) and can be disabled via CMake
+    - Official dhewm3 Windows binaries are built with tools enabled, of course.
+    - Only supports 32bit builds, because in contrast to the rest of dhewm3's code,
+      the tool code is not 64bit compatible at all.
+    - Based on Code from the dhewm3 branch of SteelStorm2, thanks to *Motorsep* for donating that code!
+    - Has some bugfixes over the state in Doom3 1.3.1, like selecting a material
+      in the Particle Editor doesn't break the viewport of the game any more.
+    - Thanks to *Tommy Hanusa* for testing and reporting some issues (that were subsequently fixed)!
+* Update savegame format (see #303 and #344)
+    - old savegames should still work, but new savegames can't be loaded with older versions of dhewm3!
+* Uploaded updated builds of Mod DLLs (esp. Dentonmod should run a lot more stable now).  
+  Added Mod DLLs of [LibreCoop](https://www.moddb.com/mods/librecoop-dhewm3-coop)
+  and [The Lost Mission](https://www.moddb.com/mods/the-lost-mission).  
+  See https://dhewm3.org/mods.html for more details.
+* dhewm3 now supports the **Doom3 Demo** gamedata
+    - See [below](#using-the-doom3-demo-gamedata) for installation instructions
+    - This is based on *Gabriel Cuvillier's* code for [D3Wasm](http://www.continuation-labs.com/projects/d3wasm/),
+      which ports dhewm3 to web browsers, thanks!
+* Create the game window on the display the cursor is currently on (when using more than one display)
+* Added `r_fullscreenDesktop` CVar to set if fullscreen mode should be "classic"
+  or "Desktop" which means a borderless window at current desktop resolution
+* Fullscreen modes that are not at the current desktop resolution should work better now
+    - including nvidia DSR / AMD VSR; for that you might have to use `dhewm3_notools.exe`,
+      as DSR/VSR seem to be incompatible with applications that use MFC
+      (the GUI framework used for the Doom3 tools like the D3Radiant)
+* Several sound-related bugfixes:
+    - Lags in starting to play a sound which for example caused the machinegun or
+      plasmagun sounds to stutter have been eliminated (#141)
+    - Trying to reset disconnected OpenAL devices, this esp. helps with display audio
+      on Intel GPUs on Windows, when switching to fullscreen (#209)
+    - Looping .wav sounds with leadin now work (#291)
+    - The game still works if no sound devices are available at all (#292)
+    - Make "idSoundCache: error unloading data from OpenAL hardware buffer" a Warning
+      instead of an Error so it doesn't terminate game (by *Corey O'Connor*, #235)
+* Restore "Carmack's Reverse" Z-Fail stencil shadows; use `glStencilOpSeparate()` if available
+    - That bloody patent finally expired last October: https://patents.google.com/patent/US6384822B1/en
+    - This neither seems to make a visual nor performance difference on any hardware
+      I tried (including Raspberry Pi 4), so this is mostly out of principle
+    - Based on Code by [*Leith Bade*](https://github.com/ljbade/doom3.gpl/commit/d4de024341e79e0ac1dfb54fb528859f8ccea605)
+      and [*Pat Raynor*](https://github.com/raynorpat/Doom3/blob/2933cb554587aea546c2df1fdf086204d4ca363d/neo/renderer/draw_stencilshadow.cpp#L147-L182).
+    - The `r_useCarmacksReverse` and `r_useStencilOpSeparate` CVars allow switching both things
+      on/off for comparison
+* New CVar `g_hitEffect`: If set to `0`, the player camera damage effects (like double-vision and extreme tilt)
+  when being hit are disabled (by *dobosken*, #279).
+* (On Windows) stdout.txt and stderr.txt are not saved next to the binary anymore,
+  but in `My Documents/My Games/dhewm3/`, like save games, because the binary dir
+  might not be writable and dhewm3 wouldn't start properly then
+* Fix lingering messages in HUD after loading savegame
+    - Sometimes the "Game saved..." message didn't go away after loading a savegame
+      (when having saved while it still was showing from last save)
+* Fixed clipping bug in delta1 which sometimes occured and made climbing some
+  ladders impossible (#328)
+* Improve compatibility with some custom scripts
+  ("t->c->value.argSize == func->parmTotal" Assertion; see #303)
+* Registering multiplayer servers at id's master-server fixed, so they can be
+  found in the multiplayer menu (by *Stradex*, #293)
+* Support for reproducible builds by setting the CMake option `REPRODUCIBLE_BUILD`.
+* Should build on recent versions of macOS, also on Apple Silicon (thanks *Dave Nicolson* and *Petter Uvesten*).
+* Proper handling of paths with dots in directory names (#299, #301)
+    - Some string functions that are intended to find/cut off/replace/... file extensions
+      did cut off the whole path at dots..
+    - Especially fixes loading and saving maps from such paths in the builtin D3Radiant level editor
+* `idFileSystemLocal::ListMods()` doesn't search `/` or `C:\` anymore
+  (it did so if one of the paths, like `fs_cdpath`, was empty)
+* Don't use translation in Autosave filenames (#305)
+    - In the Spanish translation all the Alpha Lab autosaves got the same name,
+      now the autosave name is based on the mapename instead which is distinct
+
+
+1.5.0 (2018-12-15)
+------------------------------------------------------------------------
+
+* Support for some Mods via [custom SDK](https://github.com/dhewm/dhewm3-sdk):
+  Classic Doom3, Fitz Packerton, HardQore2, Denton's Enhanced Doom3 and Rivensin.
+    - See https://dhewm3.org/mods.html for more information.
+    - This has also broken backwards compatibility with 1.4.x game DLLs,
+      that's why this version will be 1.5.0 and not 1.4.2.
+* Supports High DPI displays on Windows now
+* Scale menus, fullscreen videos and the PDA to 4:3 (with black bars left/right) on  
+  widescreen displays so they don't look stretched/distorted.
+  Can be disabled with `r_scaleMenusTo43 0`.  
+  No, this unfortunately can't be done for the HUD (except for the crosshair),
+  because it also handles fullscreen effects (for example when receiving damage),
+  and those would look bad with black/empty bars on left/right.
+* Commandline option to display some help on supported commandline arguments:
+  `-h` or `--help` or `-help` or `/?`
+* ~~(Experimental) uncapped framerate, enable by entering `com_fixedTic -1`~~
+  ~~in the console (can be set back with `com_fixedTic 0`).~~  
+  (this turned out to be broken, see #261)
+* Support for the AROS and OpenBSD operating systems
+* Several bugfixes
+
+
+1.4.1 (2016-06-19)
+------------------------------------------------------------------------
+
+* Fixed some (kinda rare) crashes due to assertion errors, especially observed in the last
+  boss fights of both doom3 and the Resurrection of Evil Addon.
+* Improved compatibility with AZERTY keyboards (the row of keys with 1...9, 0 is now usable)
+* Fixed a crash (at least on FreeBSD) when loading Resurrection of Evil's last level
+* Compatibility with Microsoft Visual Studio 2015
+* Video resolutions in menu now sorted, added 2880x1800
+* Support for up to 8 mouse buttons (on Linux this needs SDL2 2.0.4 or newer to work)
+
+
+1.4.0 (2015-10-09)
+------------------------------------------------------------------------
+
+The first dhewm3 release. Changes compared to Doom3 1.3.1 as open sourced
+on 2011-11-22 (most work done by *dhewg*):
+
+* Use CMake as build system instead of Visual Studio and XCode solutions and SCons etc
+* Replaced lots of platform-specific code with libSDL
+* Use OpenAL as only soundbackend on all platforms (instead of only on Windows)  
+  Ported EAX for sound effects to the cross-platform OpenAL EFX extension
+* Made code 64bit compatible (except for Windows-only MFC-based tools, which were disabled
+  because no free or at least no-cost compiler with MFC support was available at the time)
+* Also made it compatible with the MinGW compiler
+* Write savegames, configs, screenshots etc in user-specific directories
+  instead of installation directory on all platforms
+* Fixed lots of bugs and compiler warnings
+* Removed support for binary .pk4's, only support loading .dll/.so/.dylib for
+  game-code (mods) directly
+* Support (and automatically detect) arbitrary aspect ratios
+* Support more resolutions, inject them into the settings menu
+* Open ingame-console with Shift+Esc (=> works with all keyboard layouts)
+* Most probably much more I forgot...
diff --git a/README.md b/README.md
index 12e06f4..c81e2fe 100644
--- a/README.md
+++ b/README.md
@@ -35,6 +35,8 @@ Compared to the original _DOOM 3_, the changes of _dhewm 3_ worth mentioning are
 - A portable build system based on CMake
 - (Cross-)compilation with MinGW-w64
 
+See [Changelog.md](./Changelog.md) for a more complete changelog.
+
 
 # GENERAL NOTES
 
@@ -52,13 +54,7 @@ http://store.steampowered.com/app/9050/
 
 http://store.steampowered.com/app/9070/
 
-You can also buy Steam keys at the Humble Store:
-
-https://www.humblebundle.com/store/p/doom3_storefront
-
-https://www.humblebundle.com/store/p/doom3_resofevil_storefront
-
-Note that neither Steam nor the Humble Store offer the *Resurrection of Evil* addon
+Note that Steam does not offer the *Resurrection of Evil* addon
 for German customers (or at least people with German IP adresses).
 
 ## Compiling
@@ -68,22 +64,27 @@ The build system is based on CMake: http://cmake.org/
 Required libraries are not part of the tree. These are:
 
 - zlib
-- libjpeg (v8)
-- libogg
-- libvorbis
-- libvorbisfile (may be part of libvorbis)
 - OpenAL (OpenAL Soft required, Creative's and Apple's versions are made of fail)
 - SDL v1.2 or 2.0 (2.0 recommended)
 - libcurl (optional, required for server downloads)
+- Optionally, on non-Windows: libbacktrace
+  - sometimes (e.g. on debian-based distros like Ubuntu) it's part of libgcc (=> always available),
+    sometimes (e.g. Arch Linux, openSUSE) it's in a separate package
+  - If this is available, dhewm3 prints more useful backtraces if it crashes
+- Also, if you're not building recent dhewm3 code from git (but 1.5.1 or older):
+  - libjpeg (v8)
+  - libogg, libvorbis, libvorbisfile (may be part of libvorbis)
 
 For UNIX-like systems, these libraries need to be installed (including the
 developer files). It is recommended to use the software management tools of
 your OS (apt, dnf, portage, BSD ports, [Homebrew for macOS](http://brew.sh), ...).
 
-For Windows, there are two options:
+For Windows, there are three options:
 
 - Use the provided binaries (recommended, see below)
 - Compile these libraries yourself
+- Use [vcpkg](https://vcpkg.io/) to install the dependencies
+    - Remember to set `CMAKE_TOOLCHAIN_FILE` as described in their [Getting Started Guide](https://vcpkg.io/en/getting-started.html)
 
 Create a distinct build folder outside of this source repository and issue
 the cmake command there, pointing it at the neo/ folder from this repository:
@@ -94,6 +95,10 @@ macOS users need to point CMake at OpenAL Soft (better solutions welcome):
 
 `cmake -DOPENAL_LIBRARY=/usr/local/opt/openal-soft/lib/libopenal.dylib -DOPENAL_INCLUDE_DIR=/usr/local/opt/openal-soft/include /path/to/repository/neo`
 
+Newer versions of Homebrew install openal-soft to another directory, so use this instead:
+
+`cmake -DOPENAL_LIBRARY="/opt/homebrew/opt/openal-soft/lib/libopenal.dylib" -DOPENAL_INCLUDE_DIR="/opt/homebrew/opt/openal-soft/include" /path/to/repo/neo`
+
 ## Using the provided Windows binaries
 
 Get a clone of the latest binaries here: https://github.com/dhewm/dhewm3-libs
@@ -289,6 +294,19 @@ neo/idlib/hashing/CRC32.cpp
 
 Copyright (C) 1995-1998 Mark Adler
 
+## stb_image and stb_vorbis
+
+neo/renderer/stb_image.h
+neo/sound/stb_vorbis.h
+
+Used to decode JPEG and OGG Vorbis files.
+
+from https://github.com/nothings/stb/
+
+Copyright (c) 2017 Sean Barrett
+
+Released under MIT License and Unlicense (Public Domain)
+
 ## Brandelf utility
 
 neo/sys/linux/setup/brandelf.c
diff --git a/debian/changelog b/debian/changelog
index 703b586..dddde04 100644
--- a/debian/changelog
+++ b/debian/changelog
@@ -1,3 +1,9 @@
+dhewm3 (1.5.1+git20220227.1.adad73c-1) UNRELEASED; urgency=low
+
+  * New upstream snapshot.
+
+ -- Debian Janitor <janitor@jelmer.uk>  Thu, 31 Mar 2022 10:49:54 -0000
+
 dhewm3 (1.5.1+dfsg-1) unstable; urgency=medium
 
   * New upstream release.
diff --git a/debian/patches/01-changedatadir.patch b/debian/patches/01-changedatadir.patch
index 0a22ce8..ca509cb 100644
--- a/debian/patches/01-changedatadir.patch
+++ b/debian/patches/01-changedatadir.patch
@@ -3,9 +3,11 @@ Author: Tobias Frost <tobi@debian.org>
 Last-Update: 2015-02-14
 ---
 This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
---- a/neo/CMakeLists.txt
-+++ b/neo/CMakeLists.txt
-@@ -326,7 +326,7 @@
+Index: dhewm3/neo/CMakeLists.txt
+===================================================================
+--- dhewm3.orig/neo/CMakeLists.txt
++++ dhewm3/neo/CMakeLists.txt
+@@ -413,7 +413,7 @@ endif()
  
  set(bindir		"${CMAKE_INSTALL_FULL_BINDIR}")
  set(libdir		"${CMAKE_INSTALL_FULL_LIBDIR}/dhewm3")
diff --git a/debian/patches/30-SDL2-CMake.patch b/debian/patches/30-SDL2-CMake.patch
index 607dbd3..ee65234 100644
--- a/debian/patches/30-SDL2-CMake.patch
+++ b/debian/patches/30-SDL2-CMake.patch
@@ -5,7 +5,9 @@ Forwarded: yes, https://github.com/dhewm/dhewm3/pull/283
 Last-Update: 2020-04-07
 ---
 This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
---- a/neo/sys/cmake/FindSDL2.cmake
+Index: dhewm3/neo/sys/cmake/FindSDL2.cmake
+===================================================================
+--- dhewm3.orig/neo/sys/cmake/FindSDL2.cmake
 +++ /dev/null
 @@ -1,163 +0,0 @@
 -# Locate SDL2 library
@@ -171,9 +173,11 @@ This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
 -INCLUDE(FindPackageHandleStandardArgs)
 -
 -FIND_PACKAGE_HANDLE_STANDARD_ARGS(SDL2 REQUIRED_VARS SDL2_LIBRARY SDL2_INCLUDE_DIR)
---- a/neo/CMakeLists.txt
-+++ b/neo/CMakeLists.txt
-@@ -142,8 +142,8 @@
+Index: dhewm3/neo/CMakeLists.txt
+===================================================================
+--- dhewm3.orig/neo/CMakeLists.txt
++++ dhewm3/neo/CMakeLists.txt
+@@ -193,8 +193,8 @@ if (SDL2)
  		set(SDL2_BUILDING_LIBRARY TRUE)
  	endif()
  	find_package(SDL2 REQUIRED)
diff --git a/neo/CMakeLists.txt b/neo/CMakeLists.txt
index 22401c9..92d05c5 100644
--- a/neo/CMakeLists.txt
+++ b/neo/CMakeLists.txt
@@ -50,7 +50,7 @@ option(CORE			"Build the core" ON)
 option(BASE			"Build the base game code" ON)
 option(D3XP			"Build the d3xp game code" ON)
 if(MSVC)
-	option(TOOLS		"Build the tools game code (32bit Windows+Visual Studio+SDL2 only)" OFF)
+	option(TOOLS		"Build the tools game code (Visual Studio+SDL2 only)" OFF)
 endif()
 option(DEDICATED	"Build the dedicated server" OFF)
 option(ONATIVE		"Optimize for the host CPU" OFF)
@@ -67,36 +67,67 @@ endif()
 
 # target cpu
 set(cpu ${CMAKE_SYSTEM_PROCESSOR})
+
+# Originally, ${CMAKE_SYSTEM_PROCESSOR} was supposed to contain the *target* CPU, according to CMake's documentation.
+# As far as I can tell this has always been broken (always returns host CPU) at least on Windows
+# (see e.g. https://cmake.org/pipermail/cmake-developers/2014-September/011405.html) and wasn't reliable on
+# other systems either, for example on Linux with 32bit userland but 64bit kernel it returned the kernel CPU type
+# (e.g. x86_64 instead of i686). Instead of fixing this, CMake eventually updated their documentation in 3.20,
+# now it's officially the same as CMAKE_HOST_SYSTEM_PROCESSOR except when cross-compiling (where it's explicitly set)
+# So we gotta figure out the actual target CPU type ourselves.. (why am I sticking to this garbage buildsystem?)
+if(NOT (CMAKE_SYSTEM_PROCESSOR STREQUAL CMAKE_HOST_SYSTEM_PROCESSOR))
+	# special case: cross-compiling, here CMAKE_SYSTEM_PROCESSOR should be correct, hopefully
+	# (just leave cpu at ${CMAKE_SYSTEM_PROCESSOR})
+elseif(MSVC)
+	message(STATUS "CMAKE_GENERATOR_PLATFORM: ${CMAKE_GENERATOR_PLATFORM}")
+	if(CMAKE_GENERATOR_PLATFORM STREQUAL "Win32")
+		set(cpu "x86")
+	elseif(CMAKE_GENERATOR_PLATFORM STREQUAL "x64")
+		set(cpu "x86_64")
+	elseif(CMAKE_GENERATOR_PLATFORM STREQUAL "ARM")
+		# at least on RPi 32bit, gcc -dumpmachine outputs "arm-linux-gnueabihf",
+		#  so we'll use "arm" there => use the same for 32bit ARM on MSVC
+		set(cpu "arm")
+	elseif(CMAKE_GENERATOR_PLATFORM STREQUAL "ARM64")
+		set(cpu "arm64")
+	else()
+		message(FATAL_ERROR "Unknown Target CPU/platform ${CMAKE_GENERATOR_PLATFORM}")
+	endif()
+	message(STATUS "  => CPU architecture extracted from that: \"${cpu}\"")
+else() # not MSVC and not cross-compiling, assume GCC or clang (-compatible), seems to work for MinGW as well
+	execute_process(COMMAND ${CMAKE_C_COMPILER} "-dumpmachine"
+	                RESULT_VARIABLE cc_dumpmachine_res
+	                OUTPUT_VARIABLE cc_dumpmachine_out)
+	if(cc_dumpmachine_res EQUAL 0)
+		string(STRIP ${cc_dumpmachine_out} cc_dumpmachine_out) # get rid of trailing newline
+		message(STATUS "`${CMAKE_C_COMPILER} -dumpmachine` says: \"${cc_dumpmachine_out}\"")
+		# gcc -dumpmachine and clang -dumpmachine seem to print something like "x86_64-linux-gnu" (gcc)
+		# or "x64_64-pc-linux-gnu" (clang) or "i686-w64-mingw32" (32bit mingw-w64) i.e. starting with the CPU,
+		# then "-" and then OS or whatever - so use everything up to first "-"
+		string(REGEX MATCH "^[^-]+" cpu ${cc_dumpmachine_out})
+		message(STATUS "  => CPU architecture extracted from that: \"${cpu}\"")
+	else()
+		message(WARNING "${CMAKE_C_COMPILER} -dumpmachine failed with error (code) ${cc_dumpmachine_res}")
+		message(WARNING "will use the (sometimes incorrect) CMAKE_SYSTEM_PROCESSOR (${cpu}) to determine D3_ARCH")
+	endif()
+endif()
+
 if(cpu STREQUAL "powerpc")
 	set(cpu "ppc")
+elseif(cpu STREQUAL "aarch64")
+	# "arm64" is more obvious, and some operating systems (like macOS) use it instead of "aarch64"
+	set(cpu "arm64")
 elseif(cpu MATCHES "i.86")
 	set(cpu "x86")
 elseif(cpu MATCHES "[aA][mM][dD]64" OR cpu MATCHES "[xX]64")
 	set(cpu "x86_64")
-endif()
-
-# On Windows ${CMAKE_SYSTEM_PROCESSOR} is broken (always returns host CPU)
-# which has only been reported >7 years ago (https://cmake.org/pipermail/cmake-developers/2014-September/011405.html)
-# so obviously a fix is too much to ask. Here's the special case to make that wonderful platform work;
-# except if it's Windows for ARM(64) I guess, no idea how to detect that properly (I don't own such hardware).
-if(MSVC)
-	if(cpu MATCHES ".*[aA][rR][mM].*")
-		message(FATAL_ERROR "please fix this code to work for Windows on ARM and send a pull request")
-	endif()
-	if(CMAKE_CL_64)
-		set(cpu "x86_64")
-	else()
-		set(cpu "x86")
-	endif()
-elseif(DEFINED ENV{MINGW_CHOST})
-	# looks like it's broken in MinGW32 shells (or 32bit mingw-w64 shells) as well, this should help:
-	message(STATUS "MINGW_CHOST = $ENV{MINGW_CHOST}")
-	if($ENV{MINGW_CHOST} MATCHES "^i.86.*")
-		set(cpu "x86")
-	elseif($ENV{MINGW_CHOST} MATCHES "^x86_64.*")
-		set(cpu "x86_64")
-	else()
-		message(FATAL_ERROR "please fix this code to work for MINGW_CHOST = $ENV{MINGW_CHOST} and send a pull request!")
+elseif(cpu MATCHES "[aA][rR][mM].*") # some kind of arm..
+	# On 32bit Raspbian gcc -dumpmachine returns sth starting with "arm-",
+	# while clang -dumpmachine says "arm6k-..." - try to unify that to "arm"
+	if(CMAKE_SIZEOF_VOID_P EQUAL 8) # sizeof(void*) == 8 => must be arm64
+		set(cpu "arm64")
+	else() # should be 32bit arm then (probably "armv7l" "armv6k" or sth like that)
+		set(cpu "arm")
 	endif()
 endif()
 
@@ -153,21 +184,6 @@ endif()
 find_package(ZLIB REQUIRED)
 include_directories(${ZLIB_INCLUDE_DIRS})
 
-find_package(JPEG REQUIRED)
-include_directories(${JPEG_INCLUDE_DIR})
-
-set(CMAKE_REQUIRED_INCLUDES ${JPEG_INCLUDE_DIR})
-set(CMAKE_REQUIRED_LIBRARIES ${JPEG_LIBRARY})
-
-find_package(OGG REQUIRED)
-include_directories(${OGG_INCLUDE_DIR})
-
-find_package(Vorbis REQUIRED)
-include_directories(${VORBIS_INCLUDE_DIR})
-
-find_package(VorbisFile REQUIRED)
-include_directories(${VORBISFILE_INCLUDE_DIR})
-
 find_package(OpenAL REQUIRED)
 include_directories(${OPENAL_INCLUDE_DIR})
 
@@ -199,8 +215,9 @@ find_package(CURL QUIET)
 if(CURL_FOUND)
 	set(ID_ENABLE_CURL ON)
 	include_directories(${CURL_INCLUDE_DIR})
+	message(STATUS "libcurl found and enabled")
 else()
-	message(STATUS "libcurl not found, server downloads won't be available")
+	message(WARNING "libcurl not found, server downloads won't be available (apart from that dhewm3 will work)")
 	set(ID_ENABLE_CURL OFF)
 	set(CURL_LIBRARY "")
 endif()
@@ -213,7 +230,27 @@ if(MSVC)
 		message(SEND_ERROR "MFC ('Microsoft Foundation Classes for C++') couldn't be found, but is needed for TOOLS!")
 		message(FATAL_ERROR "If you're using VS2013, you'll also need the 'Multibyte MFC Library for Visual Studio 2013': https://www.microsoft.com/en-us/download/details.aspx?id=40770 (VS2015 and 2017 include that in the default MFC package)")
 	endif()
-endif()
+
+else() # not MSVC
+
+if(NOT WIN32)
+	# libbacktrace support - TODO: might work with MinGW? we don't have a crash handler for win32 though..
+	include(CheckCSourceCompiles)
+	set(CMAKE_REQUIRED_LIBRARIES backtrace)
+	check_c_source_compiles( "#include <backtrace.h>
+	int main() { backtrace_create_state(NULL, 0, NULL, NULL); return 0; }" HAVE_LIBBACKTRACE )
+	unset(CMAKE_REQUIRED_LIBRARIES)
+
+	if(HAVE_LIBBACKTRACE)
+		set(sys_libs ${sys_libs} backtrace)
+		add_definitions(-DD3_HAVE_LIBBACKTRACE)
+		message(STATUS "Using libbacktrace")
+	else()
+		message(WARNING "libbacktrace wasn't found. It's not required but recommended, because it provides useful backtraces if dhewm3 crashes")
+	endif()
+endif() # NOT WIN32
+
+endif() # not MSVC
 
 # compiler specific flags
 if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_C_COMPILER_ID STREQUAL "Clang")
@@ -229,14 +266,17 @@ if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_C_COMPILER_ID STREQUAL "Clang")
 	set(CMAKE_C_FLAGS_DEBUG "-g -ggdb -D_DEBUG -O0")
 	set(CMAKE_C_FLAGS_DEBUGALL "-g -ggdb -D_DEBUG")
 	set(CMAKE_C_FLAGS_PROFILE "-g -ggdb -D_DEBUG -O1 -fno-omit-frame-pointer")
-	set(CMAKE_C_FLAGS_RELEASE "-O2 -fno-unsafe-math-optimizations -fno-math-errno -fno-trapping-math -fomit-frame-pointer")
-	set(CMAKE_C_FLAGS_RELWITHDEBINFO "-g -ggdb -O2 -fno-unsafe-math-optimizations -fno-math-errno -fno-trapping-math -fno-omit-frame-pointer")
-	set(CMAKE_C_FLAGS_MINSIZEREL "-Os -fno-unsafe-math-optimizations -fno-math-errno -fno-trapping-math -fomit-frame-pointer")
+	set(CMAKE_C_FLAGS_RELEASE "-O2 -fno-math-errno -fno-trapping-math -fomit-frame-pointer")
+	set(CMAKE_C_FLAGS_RELWITHDEBINFO "-g -ggdb -O2 -fno-math-errno -fno-trapping-math -fno-omit-frame-pointer")
+	set(CMAKE_C_FLAGS_MINSIZEREL "-Os -fno-math-errno -fno-trapping-math -fomit-frame-pointer")
 
 	set(CMAKE_CXX_FLAGS_DEBUGALL ${CMAKE_C_FLAGS_DEBUGALL})
 	set(CMAKE_CXX_FLAGS_PROFILE ${CMAKE_C_FLAGS_PROFILE})
 
 	add_compile_options(-fno-strict-aliasing)
+	# dear idiot compilers, don't fuck up math code with useless FMA "optimizations"
+	# (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100839)
+	add_compile_options(-ffp-contract=off)
 
 	if(NOT AROS)
 		CHECK_CXX_COMPILER_FLAG("-fvisibility=hidden" cxx_has_fvisibility)
@@ -286,8 +326,8 @@ if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_C_COMPILER_ID STREQUAL "Clang")
 		elseif(cpu STREQUAL "ppc")
 			CHECK_CXX_COMPILER_FLAG("-arch ppc" cxx_has_arch_ppc)
 			if(cxx_has_arch_ppc)
-				add_compile_options(-arch ppc)
-				set(ldflags "${ldflags} -arch ppc")
+				add_compile_options(-arch ppc -mone-byte-bool)
+				set(ldflags "${ldflags} -arch ppc -mone-byte-bool")
 			endif()
 
 			add_compile_options(-mmacosx-version-min=10.4)
@@ -308,8 +348,11 @@ if(CMAKE_COMPILER_IS_GNUCC OR CMAKE_C_COMPILER_ID STREQUAL "Clang")
 elseif(MSVC)
 	add_compile_options(/MP) # parallel build (use all cores, or as many as configured in VS)
 	
-	add_compile_options(/W4)
+	add_compile_options(/W3) # TODO: was /W4, caused trouble with VS2019 (and/or its integrated CMake? or only HarrieVG's setup?)
 	add_compile_options(/we4840) # treat as error when passing a class to a vararg-function (probably printf-like)
+	# treat several kinds of truncating int<->pointer conversions as errors (for more 64bit-safety)
+	add_compile_options(/we4306 /we4311 /we4312 /we4302)
+	# ignore the following warnings:
 	add_compile_options(/wd4100) # unreferenced formal parameter
 	add_compile_options(/wd4127) # conditional expression is constant
 	add_compile_options(/wd4244) # possible loss of data
@@ -790,6 +833,17 @@ set(src_d3xp
 
 add_globbed_headers(src_d3xp "d3xp")
 
+set(src_debuggerServer
+	tools/debugger/DebuggerBreakpoint.h
+	tools/debugger/DebuggerBreakpoint.cpp
+	tools/debugger/DebuggerServer.h
+	tools/debugger/DebuggerServer.cpp
+	tools/debugger/DebuggerScript.h
+	tools/debugger/DebuggerScript.cpp
+	tools/debugger/DebuggerMessages.h
+	tools/debugger/debugger.cpp
+)
+
 set(src_core
 	${src_renderer}
 	${src_framework}
@@ -851,6 +905,23 @@ if (TOOLS AND MFC_FOUND AND MSVC)
 	# Script editor
 	file(GLOB src_script_editor "tools/script/*.cpp")
 	add_globbed_headers(src_script_editor "tools/script")
+	# Script Debugger
+	set(src_debuggerClient
+		tools/debugger/DebuggerClient.h
+		tools/debugger/DebuggerClient.cpp
+		tools/debugger/DebuggerApp.h
+		tools/debugger/DebuggerApp.cpp
+		tools/debugger/DebuggerQuickWatchDlg.h
+		tools/debugger/DebuggerQuickWatchDlg.cpp
+		tools/debugger/DebuggerWindow.h
+		tools/debugger/DebuggerWindow.cpp
+		tools/debugger/DebuggerFindDlg.h
+		tools/debugger/DebuggerFindDlg.cpp
+	)
+	set(src_script_debugger
+		${src_debuggerServer}
+		${src_debuggerClient}
+	)
 	# sound editor?
 	file(GLOB src_sound_editor "tools/sound/*.cpp")
 	add_globbed_headers(src_sound_editor "tools/sound")
@@ -870,13 +941,17 @@ if (TOOLS AND MFC_FOUND AND MSVC)
 		${src_map_editor}
 		${src_script_editor}
 		${src_sound_editor}
+		${src_script_debugger}
 		"tools/edit_public.h"
 		"tools/edit_gui_common.h"
 		)
 	SET(CMAKE_MFC_FLAG 2)
-	set(TOOLS_DEFINES "ID_ALLOW_TOOLS;__AFXDLL")
+	set(TOOLS_DEFINES "ID_ALLOW_TOOLS;_AFXDLL")
 else()
-	set(src_editor_tools "tools/edit_stub.cpp" "tools/edit_public.h")
+	set(src_editor_tools "tools/edit_stub.cpp" "tools/edit_public.h" "tools/debugger/debugger_common.h")
+	list(APPEND src_editor_tools
+		${src_debuggerServer}
+	)
 endif()
 
 
@@ -1020,11 +1095,7 @@ if(CORE)
 	target_link_libraries(${DHEWM3BINARY}
 		idlib
 		${OPENAL_LIBRARY}
-		${VORBISFILE_LIBRARIES}
-		${VORBIS_LIBRARIES}
-		${OGG_LIBRARIES}
 		${CURL_LIBRARY}
-		${JPEG_LIBRARY}
 		${ZLIB_LIBRARY}
 		${SDLx_LIBRARY}
 		${sys_libs}
@@ -1045,19 +1116,16 @@ if(DEDICATED)
 		${src_stub_openal}
 		${src_stub_gl}
 		${src_sys_base}
+		${src_debuggerServer}
 	)
 	
-	source_group(TREE ${CMAKE_CURRENT_SOURCE_DIR} PREFIX neo FILES ${src_core} ${src_sys_base} ${src_stub_openal} ${src_stub_gl})
+	source_group(TREE ${CMAKE_CURRENT_SOURCE_DIR} PREFIX neo FILES ${src_core} ${src_sys_base} ${src_stub_openal} ${src_stub_gl} ${src_debuggerServer})
 
 	set_target_properties(${DHEWM3BINARY}ded PROPERTIES COMPILE_DEFINITIONS "ID_DEDICATED;__DOOM_DLL__")
 	set_target_properties(${DHEWM3BINARY}ded PROPERTIES LINK_FLAGS "${ldflags}")
 	target_link_libraries(${DHEWM3BINARY}ded
 		idlib
-		${VORBISFILE_LIBRARIES}
-		${VORBIS_LIBRARIES}
-		${OGG_LIBRARIES}
 		${CURL_LIBRARY}
-		${JPEG_LIBRARY}
 		${ZLIB_LIBRARY}
 		${SDLx_LIBRARY}
 		${sys_libs}
diff --git a/neo/d3xp/GameEdit.cpp b/neo/d3xp/GameEdit.cpp
index 0c637bd..5d68b80 100644
--- a/neo/d3xp/GameEdit.cpp
+++ b/neo/d3xp/GameEdit.cpp
@@ -670,7 +670,7 @@ void idEditEntities::DisplayEntities( void ) {
 ===============================================================================
 */
 
-idGameEdit			gameEditLocal;
+idGameEditExt		gameEditLocal;
 idGameEdit *		gameEdit = &gameEditLocal;
 
 
@@ -1146,3 +1146,62 @@ void idGameEdit::MapEntityTranslate( const char *name, const idVec3 &v ) const {
 		}
 	}
 }
+
+
+/***********************************************************************
+
+  Debugger
+
+***********************************************************************/
+
+bool idGameEditExt::IsLineCode( const char *filename, int linenumber ) const {
+	idStr fileStr;
+	idProgram *program = &gameLocal.program;
+	for ( int i = 0; i < program->NumStatements( ); i++ ) 	{
+		fileStr = program->GetFilename( program->GetStatement( i ).file );
+		fileStr.BackSlashesToSlashes( );
+
+		if ( strcmp( filename, fileStr.c_str( ) ) == 0
+			&& program->GetStatement( i ).linenumber == linenumber
+			) {
+			return true;
+		}
+	}
+	return false;
+}
+
+void idGameEditExt::GetLoadedScripts(idStrList** result)
+{
+	(*result)->Clear();
+	idProgram* program = &gameLocal.program;
+
+	for (int i = 0; i < program->NumFilenames(); i++)
+	{
+		(*result)->AddUnique(idStr(program->GetFilename(i)));
+	}
+}
+
+void idGameEditExt::MSG_WriteScriptList(idBitMsg* msg)
+{
+	idProgram* program = &gameLocal.program;
+
+	msg->WriteInt(program->NumFilenames());
+	for (int i = 0; i < program->NumFilenames(); i++)
+	{
+		idStr file = program->GetFilename(i);
+		//fix this. it seams that scripts triggered by the runtime are stored with a wrong path
+		//the use // instead of '\'
+		file.BackSlashesToSlashes();
+		msg->WriteString(file);
+	}
+}
+
+const char* idGameEditExt::GetFilenameForStatement(idProgram* program, int index) const
+{
+	return program->GetFilenameForStatement(index);
+}
+
+int idGameEditExt::GetLineNumberForStatement(idProgram* program, int index) const
+{
+	return program->GetLineNumberForStatement(index);
+}
diff --git a/neo/d3xp/Game_local.cpp b/neo/d3xp/Game_local.cpp
index 08aaf9b..9e1e866 100644
--- a/neo/d3xp/Game_local.cpp
+++ b/neo/d3xp/Game_local.cpp
@@ -302,6 +302,16 @@ void idGameLocal::Clear( void ) {
 #endif
 }
 
+static bool ( *updateDebuggerFnPtr )( idInterpreter *interpreter, idProgram *program, int instructionPointer ) = NULL;
+bool updateGameDebugger( idInterpreter *interpreter, idProgram *program, int instructionPointer ) {
+	bool ret = false;
+	if ( interpreter != NULL && program != NULL ) 	{
+		ret = updateDebuggerFnPtr ? updateDebuggerFnPtr( interpreter , program, instructionPointer ) : false;
+	}
+	return ret;
+}
+
+
 /*
 ===========
 idGameLocal::Init
@@ -410,6 +420,10 @@ void idGameLocal::Init( void ) {
 	gamestate = GAMESTATE_NOMAP;
 
 	Printf( "...%d aas types\n", aasList.Num() );
+
+	//debugger support
+	common->GetAdditionalFunction( idCommon::FT_UpdateDebugger,( idCommon::FunctionPointer * ) &updateDebuggerFnPtr,NULL);
+
 }
 
 /*
@@ -1312,7 +1326,7 @@ void idGameLocal::MapPopulate( void ) {
 idGameLocal::InitFromNewMap
 ===================
 */
-void idGameLocal::InitFromNewMap( const char *mapName, idRenderWorld *renderWorld, idSoundWorld *soundWorld, bool isServer, bool isClient, int randseed ) {
+void idGameLocal::InitFromNewMap(const char* mapName, idRenderWorld* renderWorld, idSoundWorld* soundWorld, bool isServer, bool isClient, int randseed) {
 
 	this->isServer = isServer;
 	this->isClient = isClient;
@@ -2436,14 +2450,14 @@ void idGameLocal::RunTimeGroup2() {
 idGameLocal::RunFrame
 ================
 */
-gameReturn_t idGameLocal::RunFrame( const usercmd_t *clientCmds ) {
-	idEntity *	ent;
-	int			num;
-	float		ms;
-	idTimer		timer_think, timer_events, timer_singlethink;
-	gameReturn_t ret;
-	idPlayer	*player;
-	const renderView_t *view;
+gameReturn_t idGameLocal::RunFrame(const usercmd_t* clientCmds) {
+	idEntity* ent;
+	int					num;
+	float				ms;
+	idTimer				timer_think, timer_events, timer_singlethink;
+	gameReturn_t		ret;
+	idPlayer* player;
+	const renderView_t* view;
 
 #ifdef _DEBUG
 	if ( isMultiplayer ) {
@@ -2641,7 +2655,7 @@ gameReturn_t idGameLocal::RunFrame( const usercmd_t *clientCmds ) {
 
 		// see if a target_sessionCommand has forced a changelevel
 		if ( sessionCommand.Length() ) {
-			strncpy( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
+			idStr::Copynz( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
 			break;
 		}
 
@@ -4838,7 +4852,7 @@ idGameLocal::GetBestGameType
 */
 void idGameLocal::GetBestGameType( const char* map, const char* gametype, char buf[ MAX_STRING_CHARS ] ) {
 	idStr aux = mpGame.GetBestGametype( map, gametype );
-	strncpy( buf, aux.c_str(), MAX_STRING_CHARS );
+	idStr::Copynz( buf, aux.c_str(), MAX_STRING_CHARS );
 	buf[ MAX_STRING_CHARS - 1 ] = '\0';
 }
 
diff --git a/neo/d3xp/Game_local.h b/neo/d3xp/Game_local.h
index acd4fff..f9a9693 100644
--- a/neo/d3xp/Game_local.h
+++ b/neo/d3xp/Game_local.h
@@ -365,13 +365,13 @@ public:
 
 	virtual const idDict &	GetPersistentPlayerInfo( int clientNum );
 	virtual void			SetPersistentPlayerInfo( int clientNum, const idDict &playerInfo );
-	virtual void			InitFromNewMap( const char *mapName, idRenderWorld *renderWorld, idSoundWorld *soundWorld, bool isServer, bool isClient, int randSeed );
-	virtual bool			InitFromSaveGame( const char *mapName, idRenderWorld *renderWorld, idSoundWorld *soundWorld, idFile *saveGameFile );
+	virtual void			InitFromNewMap(const char* mapName, idRenderWorld* renderWorld, idSoundWorld* soundWorld, bool isServer, bool isClient, int randSeed );
+	virtual bool			InitFromSaveGame(const char* mapName, idRenderWorld* renderWorld, idSoundWorld* soundWorld, idFile* saveGameFile );
 	virtual void			SaveGame( idFile *saveGameFile );
 	virtual void			MapShutdown( void );
 	virtual void			CacheDictionaryMedia( const idDict *dict );
 	virtual void			SpawnPlayer( int clientNum );
-	virtual gameReturn_t	RunFrame( const usercmd_t *clientCmds );
+	virtual gameReturn_t	RunFrame(const usercmd_t* clientCmds );
 	virtual bool			Draw( int clientNum );
 	virtual escReply_t		HandleESC( idUserInterface **gui );
 	virtual idUserInterface	*StartMenu( void );
diff --git a/neo/d3xp/Game_network.cpp b/neo/d3xp/Game_network.cpp
index 36d799e..7c3bb14 100644
--- a/neo/d3xp/Game_network.cpp
+++ b/neo/d3xp/Game_network.cpp
@@ -1566,7 +1566,7 @@ gameReturn_t idGameLocal::ClientPrediction( int clientNum, const usercmd_t *clie
 	}
 
 	if ( sessionCommand.Length() ) {
-		strncpy( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
+		idStr::Copynz( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
 	}
 	return ret;
 }
diff --git a/neo/d3xp/gamesys/TypeInfo.cpp b/neo/d3xp/gamesys/TypeInfo.cpp
index 2742c68..aab6c46 100644
--- a/neo/d3xp/gamesys/TypeInfo.cpp
+++ b/neo/d3xp/gamesys/TypeInfo.cpp
@@ -569,10 +569,18 @@ int idTypeInfoTools::WriteVariable_r( const void *varPtr, const char *varName, c
 		return typeSize;
 	}
 
+#if D3_SIZEOFPTR == 4
+	const uintptr_t uninitPtr = (uintptr_t)0xcdcdcdcdUL;
+#elif D3_SIZEOFPTR == 8
+	const uintptr_t uninitPtr = (uintptr_t)0xcdcdcdcdcdcdcdcdULL;
+#else
+#error "Unexpected pointer size"
+#endif
+
 	// if this is a pointer
 	isPointer = 0;
 	for ( i = typeString.Length(); i > 0 && typeString[i - 1] == '*'; i -= 2 ) {
-		if ( varPtr == (void *)0xcdcdcdcd || ( varPtr != NULL && *((unsigned int *)varPtr) == 0xcdcdcdcd ) ) {
+		if ( varPtr == (void*)uninitPtr || ( varPtr != NULL && *((unsigned int *)varPtr) == 0xcdcdcdcd ) ) {
 			common->Warning( "%s%s::%s%s references uninitialized memory", prefix, scope, varName, "" );
 			return typeSize;
 		}
diff --git a/neo/d3xp/script/Script_Compiler.cpp b/neo/d3xp/script/Script_Compiler.cpp
index 8e6e308..7f28d8e 100644
--- a/neo/d3xp/script/Script_Compiler.cpp
+++ b/neo/d3xp/script/Script_Compiler.cpp
@@ -28,6 +28,7 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "sys/platform.h"
 #include "idlib/Timer.h"
+#include "framework/FileSystem.h"
 
 #include "script/Script_Thread.h"
 #include "Game_local.h"
@@ -2620,6 +2621,8 @@ void idCompiler::CompileFile( const char *text, const char *filename, bool toCon
 
 	compile_time.Start();
 
+	idStr origFileName = filename; // DG: filename pointer might become invalid when calling NextToken() below
+
 	scope				= &def_namespace;
 	basetype			= NULL;
 	callthread			= false;
@@ -2687,6 +2690,11 @@ void idCompiler::CompileFile( const char *text, const char *filename, bool toCon
 
 	compile_time.Stop();
 	if ( !toConsole ) {
-		gameLocal.Printf( "Compiled '%s': %u ms\n", filename, compile_time.Milliseconds() );
+		// DG: filename can be overwritten by NextToken() (via gameLocal.program.GetFilenum()), so
+		//     use a copy, origFileName, that's still valid here. Furthermore, the path is nonsense,
+		//     as idProgram::CompileText() called fileSystem->RelativePathToOSPath() on it
+		//     which does not return the *actual* full path of that file but invents one,
+		//     so revert that to the relative filename which at least isn't misleading
+		gameLocal.Printf( "Compiled '%s': %u ms\n", fileSystem->OSPathToRelativePath(origFileName), compile_time.Milliseconds() );
 	}
 }
diff --git a/neo/d3xp/script/Script_Interpreter.cpp b/neo/d3xp/script/Script_Interpreter.cpp
index c8e0cc4..8efd298 100644
--- a/neo/d3xp/script/Script_Interpreter.cpp
+++ b/neo/d3xp/script/Script_Interpreter.cpp
@@ -33,6 +33,11 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "script/Script_Interpreter.h"
 
+#include "framework/FileSystem.h"
+
+// HvG: Debugger support
+extern bool updateGameDebugger( idInterpreter *interpreter, idProgram *program, int instructionPointer );
+
 /*
 ================
 idInterpreter::idInterpreter()
@@ -183,7 +188,6 @@ idInterpreter::GetRegisterValue
 Returns a string representation of the value of the register.  This is
 used primarily for the debugger and debugging
 
-//FIXME:  This is pretty much wrong.  won't access data in most situations.
 ================
 */
 bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDepth ) {
@@ -191,17 +195,17 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 	idVarDef		*d;
 	char			funcObject[ 1024 ];
 	char			*funcName;
-	const idVarDef	*scope;
+	const idVarDef	*scope = NULL;
+	const idVarDef	*scopeObj;
 	const idTypeDef	*field;
-	const idScriptObject *obj;
 	const function_t *func;
 
 	out.Empty();
-
+	
 	if ( scopeDepth == -1 ) {
 		scopeDepth = callStackDepth;
-	}
-
+	}	
+	
 	if ( scopeDepth == callStackDepth ) {
 		func = currentFunction;
 	} else {
@@ -215,35 +219,44 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 	funcName = strstr( funcObject, "::" );
 	if ( funcName ) {
 		*funcName = '\0';
-		scope = gameLocal.program.GetDef( NULL, funcObject, &def_namespace );
-		funcName += 2;
+		scopeObj = gameLocal.program.GetDef( NULL, funcObject, &def_namespace );
+		funcName += 2;				
+		if ( scopeObj )
+		{
+			scope = gameLocal.program.GetDef( NULL, funcName, scopeObj );
+		}
 	} else {
 		funcName = funcObject;
-		scope = &def_namespace;
+		scope = gameLocal.program.GetDef( NULL, func->Name(), &def_namespace );
+		scopeObj = NULL;
 	}
 
-	// Get the function from the object
-	d = gameLocal.program.GetDef( NULL, funcName, scope );
-	if ( !d ) {
+	if ( !scope )
+	{
 		return false;
 	}
 
-	// Get the variable itself and check various namespaces
-	d = gameLocal.program.GetDef( NULL, name, d );
-	if ( !d ) {
-		if ( scope == &def_namespace ) {
-			return false;
-		}
-
-		d = gameLocal.program.GetDef( NULL, name, scope );
-		if ( !d ) {
-			d = gameLocal.program.GetDef( NULL, name, &def_namespace );
-			if ( !d ) {
-				return false;
+	d = gameLocal.program.GetDef( NULL, name, scope );
+	
+	// Check the objects for it if it wasnt local to the function
+	if ( !d )
+	{
+		for ( ; scopeObj && scopeObj->TypeDef()->SuperClass(); scopeObj = scopeObj->TypeDef()->SuperClass()->def )
+		{
+			d = gameLocal.program.GetDef( NULL, name, scopeObj );
+			if ( d )
+			{
+				break;
 			}
 		}
-	}
+	}	
 
+	if ( !d )
+	{
+		out = "???";
+		return false;
+	}
+	
 	reg = GetVariable( d );
 	switch( d->Type() ) {
 	case ev_float:
@@ -256,7 +269,7 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 		break;
 
 	case ev_vector:
-		if ( reg.vectorPtr ) {
+		if ( reg.vectorPtr ) {				
 			out = va( "%g,%g,%g", reg.vectorPtr->x, reg.vectorPtr->y, reg.vectorPtr->z );
 		} else {
 			out = "0,0,0";
@@ -274,30 +287,55 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 		break;
 
 	case ev_field:
+	{
+		idEntity*		entity;			
+		idScriptObject*	obj;
+		
 		if ( scope == &def_namespace ) {
 			// should never happen, but handle it safely anyway
 			return false;
 		}
 
-		field = scope->TypeDef()->GetParmType( reg.ptrOffset )->FieldType();
-		obj   = *reinterpret_cast<const idScriptObject **>( &localstack[ callStack[ callStackDepth ].stackbase ] );
-		if ( !field || !obj ) {
+		field  = d->TypeDef()->FieldType();
+		entity = GetEntity ( *((int*)&localstack[ localstackBase ]) );
+		if ( !entity || !field )
+		{
 			return false;
 		}
 
+		obj = &entity->scriptObject;
+		if ( !obj ) {
+			return false;
+		}
+		
 		switch ( field->Type() ) {
-		case ev_boolean:
-			out = va( "%d", *( reinterpret_cast<int *>( &obj->data[ reg.ptrOffset ] ) ) );
-			return true;
-
-		case ev_float:
-			out = va( "%g", *( reinterpret_cast<float *>( &obj->data[ reg.ptrOffset ] ) ) );
-			return true;
+			case ev_boolean:
+				out = va( "%d", *( reinterpret_cast<int *>( &obj->data[ reg.ptrOffset ] ) ) );
+				return true;
+
+			case ev_float:
+				out = va( "%g", *( reinterpret_cast<float *>( &obj->data[ reg.ptrOffset ] ) ) );
+				return true;
+				
+			case ev_string:	{
+				const char* str;
+				str = reinterpret_cast<const char*>( &obj->data[ reg.ptrOffset ] );
+				if ( !str ) {
+					out = "\"\"";
+				} else {
+					out  = "\"";
+					out += str;			
+					out += "\"";
+				}
+				return true;
+			}
 
-		default:
-			return false;
+			default:
+				return false;
 		}
+		
 		break;
+	}
 
 	case ev_string:
 		if ( reg.stringPtr ) {
@@ -313,7 +351,6 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 		return false;
 	}
 }
-
 /*
 ================
 idInterpreter::GetCallstackDepth
@@ -969,6 +1006,19 @@ bool idInterpreter::Execute( void ) {
 		// next statement
 		st = &gameLocal.program.GetStatement( instructionPointer );
 
+		if ( !updateGameDebugger( this, &gameLocal.program, instructionPointer )
+			&& g_debugScript.GetBool( ) ) 
+		{
+			static int lastLineNumber = -1;
+			if (lastLineNumber != gameLocal.program.GetStatement(instructionPointer).linenumber) {
+				gameLocal.Printf("%s (%d)\n",
+					gameLocal.program.GetFilename(gameLocal.program.GetStatement(instructionPointer).file),
+					gameLocal.program.GetStatement(instructionPointer).linenumber
+				);
+				lastLineNumber = gameLocal.program.GetStatement(instructionPointer).linenumber;
+			}
+		}
+
 		switch( st->op ) {
 		case OP_RETURN:
 			LeaveFunction( st->a );
@@ -1833,3 +1883,99 @@ bool idInterpreter::Execute( void ) {
 
 	return threadDying;
 }
+
+
+bool idGameEditExt::CheckForBreakPointHit(const idInterpreter* interpreter, const function_t* function1, const function_t* function2, int depth) const
+{
+	return ((interpreter->GetCurrentFunction() == function1 ||
+		interpreter->GetCurrentFunction() == function2) &&
+		(interpreter->GetCallstackDepth() <= depth));
+}
+
+bool idGameEditExt::ReturnedFromFunction(const idProgram* program, const idInterpreter* interpreter, int index) const
+{
+
+	return (const_cast<idProgram*>(program)->GetStatement(index).op == OP_RETURN && interpreter->GetCallstackDepth() <= 1);
+}
+
+bool idGameEditExt::GetRegisterValue(const idInterpreter* interpreter, const char* name, idStr& out, int scopeDepth) const
+{
+	return const_cast<idInterpreter*>(interpreter)->GetRegisterValue(name, out, scopeDepth);
+}
+
+const idThread* idGameEditExt::GetThread(const idInterpreter* interpreter) const
+{
+	return interpreter->GetThread();
+}
+
+void idGameEditExt::MSG_WriteCallstackFunc(idBitMsg* msg, const prstack_t* stack, const idProgram* program, int instructionPtr)
+{
+	const statement_t* st;
+	const function_t* func;
+
+	func = stack->f;
+
+	// If the function is unknown then just fill in with default data.
+	if (!func)
+	{
+		msg->WriteString("<UNKNOWN>");
+		msg->WriteString("<UNKNOWN>");
+		msg->WriteInt(0);
+		return;
+	}
+	else
+	{
+		msg->WriteString(va("%s(  )", func->Name()));
+	}
+
+	if (stack->s == -1) //this is a fake stack created by debugger, use intruction pointer for retrieval.
+		st = &const_cast<idProgram*>(program)->GetStatement(instructionPtr);
+	else // Use the calling statement as the filename and linenumber where the call was made from		
+		st = &const_cast<idProgram*>(program)->GetStatement(stack->s);
+
+	if (st)
+	{
+		idStr qpath = const_cast<idProgram*>(program)->GetFilename(st->file);
+		if (idStr::FindChar(qpath, ':') != -1)
+			qpath = fileSystem->OSPathToRelativePath(qpath.c_str());
+		qpath.BackSlashesToSlashes();
+		msg->WriteString(qpath);
+		msg->WriteInt(st->linenumber);
+	}
+	else
+	{
+		msg->WriteString("<UNKNOWN>");
+		msg->WriteInt(0);
+	}
+}
+
+void idGameEditExt::MSG_WriteInterpreterInfo(idBitMsg* msg, const idInterpreter* interpreter, const idProgram* program, int instructionPtr)
+{
+	int			i;
+	prstack_s	temp;
+
+	msg->WriteShort((int)interpreter->GetCallstackDepth());
+
+	// write out the current function
+	temp.f = interpreter->GetCurrentFunction();
+	temp.s = -1;
+	temp.stackbase = 0;
+	MSG_WriteCallstackFunc(msg, &temp, program, instructionPtr);
+
+	// Run through all of the callstack and write each to the msg
+	for (i = interpreter->GetCallstackDepth() - 1; i > 0; i--)
+	{
+		MSG_WriteCallstackFunc(msg, interpreter->GetCallstack() + i, program, instructionPtr);
+	}
+}
+
+
+int idGameEditExt::GetInterpreterCallStackDepth(const idInterpreter* interpreter)
+{
+	return interpreter->GetCallstackDepth();
+}
+
+const function_t* idGameEditExt::GetInterpreterCallStackFunction(const idInterpreter* interpreter, int stackDepth/* = -1*/)
+{
+	return interpreter->GetCallstack()[stackDepth > -1 ? stackDepth : interpreter->GetCallstackDepth()].f;
+}
\ No newline at end of file
diff --git a/neo/d3xp/script/Script_Thread.cpp b/neo/d3xp/script/Script_Thread.cpp
index cb82633..ef0b8ed 100644
--- a/neo/d3xp/script/Script_Thread.cpp
+++ b/neo/d3xp/script/Script_Thread.cpp
@@ -1921,3 +1921,49 @@ void idThread::Event_InfluenceActive( void ) {
 		idThread::ReturnInt( false );
 	}
 }
+
+int idGameEditExt::ThreadGetNum(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->GetThreadNum();
+}
+
+const char* idGameEditExt::ThreadGetName(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->GetThreadName();
+}
+
+int	idGameEditExt::GetTotalScriptThreads() const
+{
+	return idThread::GetThreads().Num();
+}
+
+const idThread* idGameEditExt::GetThreadByIndex(int index) const
+{
+	return idThread::GetThreads()[index];
+}
+
+bool idGameEditExt::ThreadIsDoneProcessing(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->IsDoneProcessing();
+}
+
+bool idGameEditExt::ThreadIsWaiting(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->IsWaiting();
+}
+
+bool idGameEditExt::ThreadIsDying(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->IsDying();
+}
+
+void idGameEditExt::MSG_WriteThreadInfo(idBitMsg* msg, const idThread* thread, const idInterpreter* interpreter)
+{
+	msg->WriteString(const_cast<idThread*>(thread)->GetThreadName());
+	msg->WriteInt(const_cast<idThread*>(thread)->GetThreadNum());
+
+	msg->WriteBits((int)(thread == interpreter->GetThread()), 1);
+	msg->WriteBits((int)const_cast<idThread*>(thread)->IsDoneProcessing(), 1);
+	msg->WriteBits((int)const_cast<idThread*>(thread)->IsWaiting(), 1);
+	msg->WriteBits((int)const_cast<idThread*>(thread)->IsDying(), 1);
+}
\ No newline at end of file
diff --git a/neo/framework/CVarSystem.cpp b/neo/framework/CVarSystem.cpp
index 413e3d5..a10860e 100644
--- a/neo/framework/CVarSystem.cpp
+++ b/neo/framework/CVarSystem.cpp
@@ -244,7 +244,7 @@ void idInternalCVar::UpdateValue( void ) {
 				clamped = true;
 			}
 		}
-		if ( clamped || !idStr::IsNumeric( value ) || idStr::FindChar( value, '.' ) ) {
+		if ( clamped || !idStr::IsNumeric( value ) || (idStr::FindChar( value, '.' )!=-1) ) {
 			valueString = idStr( integerValue );
 			value = valueString.c_str();
 		}
diff --git a/neo/framework/CVarSystem.h b/neo/framework/CVarSystem.h
index 1f1bd31..0f262dc 100644
--- a/neo/framework/CVarSystem.h
+++ b/neo/framework/CVarSystem.h
@@ -182,6 +182,8 @@ private:
 	static idCVar *			staticVars;
 };
 
+static idCVar const * const staticCVarsInvalid = (const idCVar*)(uintptr_t)0xFFFFFFFF;
+
 ID_INLINE idCVar::idCVar( const char *name, const char *value, int flags, const char *description,
 							argCompletion_t valueCompletion ) {
 	if ( !valueCompletion && ( flags & CVAR_BOOL ) ) {
@@ -293,7 +295,7 @@ ID_INLINE void idCVar::Init( const char *name, const char *value, int flags, con
 	this->integerValue = 0;
 	this->floatValue = 0.0f;
 	this->internalVar = this;
-	if ( staticVars != (idCVar *)0xFFFFFFFF ) {
+	if ( staticVars != staticCVarsInvalid ) {
 		this->next = staticVars;
 		staticVars = this;
 	} else {
@@ -302,11 +304,11 @@ ID_INLINE void idCVar::Init( const char *name, const char *value, int flags, con
 }
 
 ID_INLINE void idCVar::RegisterStaticVars( void ) {
-	if ( staticVars != (idCVar *)0xFFFFFFFF ) {
+	if ( staticVars != staticCVarsInvalid ) {
 		for ( idCVar *cvar = staticVars; cvar; cvar = cvar->next ) {
 			cvarSystem->Register( cvar );
 		}
-		staticVars = (idCVar *)0xFFFFFFFF;
+		staticVars = (idCVar*)staticCVarsInvalid;
 	}
 }
 
diff --git a/neo/framework/Common.cpp b/neo/framework/Common.cpp
index d0bda13..ed51965 100644
--- a/neo/framework/Common.cpp
+++ b/neo/framework/Common.cpp
@@ -98,6 +98,10 @@ idCVar com_timescale( "timescale", "1", CVAR_SYSTEM | CVAR_FLOAT, "scales the ti
 idCVar com_makingBuild( "com_makingBuild", "0", CVAR_BOOL | CVAR_SYSTEM, "1 when making a build" );
 idCVar com_updateLoadSize( "com_updateLoadSize", "0", CVAR_BOOL | CVAR_SYSTEM | CVAR_NOCHEAT, "update the load size after loading a map" );
 
+idCVar com_enableDebuggerServer( "com_enableDebuggerServer", "0", CVAR_BOOL | CVAR_SYSTEM, "toggle debugger server and try to connect to com_dbgClientAdr" );
+idCVar com_dbgClientAdr( "com_dbgClientAdr", "localhost", CVAR_SYSTEM | CVAR_ARCHIVE, "debuggerApp client address" );
+idCVar com_dbgServerAdr( "com_dbgServerAdr", "localhost", CVAR_SYSTEM | CVAR_ARCHIVE, "debugger server address" );
+
 idCVar com_product_lang_ext( "com_product_lang_ext", "1", CVAR_INTEGER | CVAR_SYSTEM | CVAR_ARCHIVE, "Extension to use when creating language files." );
 
 // com_speeds times
@@ -112,6 +116,8 @@ volatile int	com_ticNumber;			// 60 hz tics
 int				com_editors;			// currently opened editor(s)
 bool			com_editorActive;		//  true if an editor has focus
 
+bool			com_debuggerSupported;	// only set to true when the updateDebugger function is set. see GetAdditionalFunction()
+
 #ifdef _WIN32
 HWND			com_hwndMsg = NULL;
 bool			com_outputMsg = false;
@@ -245,6 +251,7 @@ idCommonLocal::idCommonLocal( void ) {
 	com_fullyInitialized = false;
 	com_refreshOnPrint = false;
 	com_errorEntered = 0;
+	com_debuggerSupported = false;
 
 	strcpy( errorMessage, "" );
 
@@ -382,12 +389,17 @@ void idCommonLocal::VPrintf( const char *fmt, va_list args ) {
 	// remove any color codes
 	idStr::RemoveColors( msg );
 
-	// echo to dedicated console and early console
-	Sys_Printf( "%s", msg );
-
-	// print to script debugger server
-	// DebuggerServerPrint( msg );
-
+	if ( com_enableDebuggerServer.GetBool( ) ) 	{
+		// print to script debugger server
+		if ( com_editors & EDITOR_DEBUGGER )
+			DebuggerServerPrint( msg );
+		else
+			// only echo to dedicated console and early console when debugger is not running so no 
+			// deadlocks occur if engine functions called from the debuggerthread trace stuff..
+			Sys_Printf( "%s", msg );
+	} else {
+		Sys_Printf( "%s", msg );
+	}
 #if 0	// !@#
 #if defined(_DEBUG) && defined(WIN32)
 	if ( strlen( msg ) < 512 ) {
@@ -984,7 +996,6 @@ Activates or Deactivates a tool
 */
 void idCommonLocal::ActivateTool( bool active ) {
 	com_editorActive = active;
-	Sys_GrabMouseCursor( !active );
 }
 
 /*
@@ -1134,8 +1145,14 @@ Com_ScriptDebugger_f
 static void Com_ScriptDebugger_f( const idCmdArgs &args ) {
 	// Make sure it wasnt on the command line
 	if ( !( com_editors & EDITOR_DEBUGGER ) ) {
-		common->Printf( "Script debugger is currently disabled\n" );
-		// DebuggerClientLaunch();
+		
+		//start debugger server if needed
+		if ( !com_enableDebuggerServer.GetBool() )
+			com_enableDebuggerServer.SetBool( true );
+
+		//start debugger client.
+		DebuggerClientLaunch();
+
 	}
 }
 
@@ -2020,6 +2037,7 @@ void Com_LocalizeMaps_f( const idCmdArgs &args ) {
 		strCount += LocalizeMap(args.Argv(2), strTable, listHash, excludeList, write);
 	} else {
 		idStrList files;
+		//wow, what now? a hardcoded path?
 		GetFileList("z:/d3xp/d3xp/maps/game", "*.map", files);
 		for ( int i = 0; i < files.Num(); i++ ) {
 			idStr file =  fileSystem->OSPathToRelativePath(files[i]);
@@ -2398,6 +2416,14 @@ void idCommonLocal::Frame( void ) {
 			InitSIMD();
 		}
 
+		if ( com_enableDebuggerServer.IsModified() ) {
+			if ( com_enableDebuggerServer.GetBool() ) {
+				DebuggerServerInit();
+			} else {
+				DebuggerServerShutdown();
+			}
+		}
+
 		eventLoop->RunEventLoop();
 
 		com_frameTime = com_ticNumber * USERCMD_MSEC;
@@ -2566,7 +2592,7 @@ void idCommonLocal::Async( void ) {
 =================
 idCommonLocal::LoadGameDLLbyName
 
-Helper for LoadGameDLL() to make it less painfull to try different dll names.
+Helper for LoadGameDLL() to make it less painful to try different dll names.
 =================
 */
 void idCommonLocal::LoadGameDLLbyName( const char *dll, idStr& s ) {
@@ -2666,7 +2692,7 @@ void idCommonLocal::LoadGameDLL( void ) {
 	gameImport.AASFileManager			= ::AASFileManager;
 	gameImport.collisionModelManager	= ::collisionModelManager;
 
-	gameExport							= *GetGameAPI( &gameImport );
+	gameExport							= *GetGameAPI( &gameImport);
 
 	if ( gameExport.version != GAME_API_VERSION ) {
 		Sys_DLL_Unload( gameDLL );
@@ -2709,6 +2735,7 @@ void idCommonLocal::UnloadGameDLL( void ) {
 
 #endif
 
+	com_debuggerSupported = false; // HvG: Reset debugger availability.
 	gameCallbacks.Reset(); // DG: these callbacks are invalid now because DLL has been unloaded
 }
 
@@ -2896,6 +2923,17 @@ void idCommonLocal::Init( int argc, char **argv ) {
 
 	Sys_InitThreads();
 
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	/* Force the window to minimize when focus is lost. This was the
+	 * default behavior until SDL 2.0.12 and changed with 2.0.14.
+	 * The windows staying maximized has some odd implications for
+	 * window ordering under Windows and some X11 window managers
+	 * like kwin. See:
+	 *  * https://github.com/libsdl-org/SDL/issues/4039
+	 *  * https://github.com/libsdl-org/SDL/issues/3656 */
+	SDL_SetHint( SDL_HINT_VIDEO_MINIMIZE_ON_FOCUS_LOSS, "1" );
+#endif
+
 	try {
 
 		// set interface pointers used by idLib
@@ -2935,6 +2973,10 @@ void idCommonLocal::Init( int argc, char **argv ) {
 		Printf( "%s using SDL v%u.%u.%u\n",
 				version.string, sdlv.major, sdlv.minor, sdlv.patch );
 
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+		Printf( "SDL video driver: %s\n", SDL_GetCurrentVideoDriver() );
+#endif
+
 		// initialize key input/binding, done early so bind command exists
 		idKeyInput::Init();
 
@@ -3164,14 +3206,15 @@ void idCommonLocal::InitGame( void ) {
 	// initialize the user interfaces
 	uiManager->Init();
 
-	// startup the script debugger
-	// DebuggerServerInit();
-
 	PrintLoadingMessage( common->GetLanguageDict()->GetString( "#str_04350" ) );
 
 	// load the game dll
 	LoadGameDLL();
 
+	// startup the script debugger
+	if ( com_enableDebuggerServer.GetBool( ) )
+		DebuggerServerInit( );
+
 	PrintLoadingMessage( common->GetLanguageDict()->GetString( "#str_04351" ) );
 
 	// init the session
@@ -3203,7 +3246,8 @@ void idCommonLocal::ShutdownGame( bool reloading ) {
 	}
 
 	// shutdown the script debugger
-	// DebuggerServerShutdown();
+	if ( com_enableDebuggerServer.GetBool() )	
+		DebuggerServerShutdown();
 
 	idAsyncNetwork::client.Shutdown();
 
@@ -3266,11 +3310,21 @@ bool idCommonLocal::SetCallback(idCommon::CallbackType cbt, idCommon::FunctionPo
 	}
 }
 
-static bool isDemo(void)
+static bool isDemo( void )
 {
 	return sessLocal.IsDemoVersion();
 }
 
+static bool updateDebugger( idInterpreter *interpreter, idProgram *program, int instructionPointer )
+{
+	if (com_editors & EDITOR_DEBUGGER) 
+	{
+		DebuggerServerCheckBreakpoint( interpreter, program, instructionPointer );
+		return true;
+	}
+	return false;
+}
+
 // returns true if that function is available in this version of dhewm3
 // *out_fnptr will be the function (you'll have to cast it probably)
 // *out_userArg will be an argument you have to pass to the function, if appropriate (else NULL)
@@ -3284,6 +3338,7 @@ bool idCommonLocal::GetAdditionalFunction(idCommon::FunctionType ft, idCommon::F
 		Warning("Called idCommon::GetAdditionalFunction() with out_fnptr == NULL!\n");
 		return false;
 	}
+
 	switch(ft)
 	{
 		case idCommon::FT_IsDemo:
@@ -3291,6 +3346,11 @@ bool idCommonLocal::GetAdditionalFunction(idCommon::FunctionType ft, idCommon::F
 			// don't set *out_userArg, this function takes no arguments
 			return true;
 
+		case idCommon::FT_UpdateDebugger:
+			*out_fnptr = (idCommon::FunctionPointer)updateDebugger;
+			com_debuggerSupported = true;
+			return true;
+
 		default:
 			*out_fnptr = NULL;
 			Warning("Called idCommon::SetCallback() with unknown FunctionType %d!\n", ft);
@@ -3298,7 +3358,6 @@ bool idCommonLocal::GetAdditionalFunction(idCommon::FunctionType ft, idCommon::F
 	}
 }
 
-
 idGameCallbacks gameCallbacks;
 
 idGameCallbacks::idGameCallbacks()
diff --git a/neo/framework/Common.h b/neo/framework/Common.h
index c0400bb..3262e6c 100644
--- a/neo/framework/Common.h
+++ b/neo/framework/Common.h
@@ -73,6 +73,9 @@ extern idCVar		com_showAsyncStats;
 extern idCVar		com_showSoundDecoders;
 extern idCVar		com_makingBuild;
 extern idCVar		com_updateLoadSize;
+extern idCVar		com_enableDebuggerServer;
+extern idCVar		com_dbgClientAdr;
+extern idCVar		com_dbgServerAdr;
 
 extern int			time_gameFrame;			// game logic time
 extern int			time_gameDraw;			// game present time
@@ -84,6 +87,8 @@ extern volatile int	com_ticNumber;			// 60 hz tics, incremented by async functio
 extern int			com_editors;			// current active editor(s)
 extern bool			com_editorActive;		// true if an editor has focus
 
+extern bool			com_debuggerSupported;	// only set to true when the updateDebugger function is set. see GetAdditionalFunction()
+
 #ifdef _WIN32
 const char			DMAP_MSGID[] = "DMAPOutput";
 const char			DMAP_DONE[] = "DMAPDone";
@@ -111,6 +116,8 @@ struct MemInfo_t {
 };
 
 class idLangDict;
+class idInterpreter;
+class idProgram;
 
 class idCommon {
 public:
@@ -158,6 +165,7 @@ public:
 								// Writes cvars with the given flags to a file.
 	virtual void				WriteFlaggedCVarsToFile( const char *filename, int flags, const char *setCmd ) = 0;
 
+
 								// Begins redirection of console output to the given buffer.
 	virtual void				BeginRedirect( char *buffer, int buffersize, void (*flush)( const char * ) ) = 0;
 
@@ -265,6 +273,11 @@ public:
 		// it returns true if we're currently running the doom3 demo
 		// not relevant for mods, only for game/ aka base.dll/base.so/...
 		FT_IsDemo = 1,
+		// the function's signature is bool fn(idInterpreter,idProgram,int) with arguments:
+		// idInterpreter *interpreter, idProgram *program, int instructionPointer
+		// it returns true if the game debugger is active.
+		// relevant for mods.
+		FT_UpdateDebugger,
 	};
 
 	// returns true if that function is available in this version of dhewm3
diff --git a/neo/framework/Console.cpp b/neo/framework/Console.cpp
index 769d997..8271074 100644
--- a/neo/framework/Console.cpp
+++ b/neo/framework/Console.cpp
@@ -797,8 +797,9 @@ bool	idConsoleLocal::ProcessEvent( const sysEvent_t *event, bool forceAccept ) {
 	bool consoleKey = false;
 	if(event->evType == SE_KEY)
 	{
-		if( event->evValue == Sys_GetConsoleKey( false ) || event->evValue == Sys_GetConsoleKey( true )
-		    || (event->evValue == K_ESCAPE && idKeyInput::IsDown( K_SHIFT )) ) // shift+esc should also open console
+		bool shiftPressed = idKeyInput::IsDown( K_SHIFT );
+		if( event->evValue == K_CONSOLE || event->evValue == Sys_GetConsoleKey( shiftPressed )
+		   || (event->evValue == K_ESCAPE && shiftPressed) ) // shift+esc should also open console
 		{
 			consoleKey = true;
 		}
@@ -825,7 +826,6 @@ bool	idConsoleLocal::ProcessEvent( const sysEvent_t *event, bool forceAccept ) {
 		// a down event will toggle the destination lines
 		if ( keyCatching ) {
 			Close();
-			Sys_GrabMouseCursor( true );
 			cvarSystem->SetCVarBool( "ui_chat", false );
 		} else {
 			consoleField.Clear();
@@ -850,7 +850,7 @@ bool	idConsoleLocal::ProcessEvent( const sysEvent_t *event, bool forceAccept ) {
 	// handle key and character events
 	if ( event->evType == SE_CHAR ) {
 		// never send the console key as a character
-		if ( event->evValue != Sys_GetConsoleKey( false ) && event->evValue != Sys_GetConsoleKey( true ) ) {
+		if ( event->evValue != Sys_GetConsoleKey( idKeyInput::IsDown( K_SHIFT ) ) ) {
 			consoleField.CharEvent( event->evValue );
 		}
 		return true;
diff --git a/neo/framework/DeclManager.cpp b/neo/framework/DeclManager.cpp
index 8dc014b..d7a35c5 100644
--- a/neo/framework/DeclManager.cpp
+++ b/neo/framework/DeclManager.cpp
@@ -1782,6 +1782,7 @@ idDeclLocal::idDeclLocal( void ) {
 	everReferenced = false;
 	redefinedInReload = false;
 	nextInFile = NULL;
+	self = NULL;
 }
 
 /*
diff --git a/neo/framework/EditField.cpp b/neo/framework/EditField.cpp
index ada5d06..65913d3 100644
--- a/neo/framework/EditField.cpp
+++ b/neo/framework/EditField.cpp
@@ -478,7 +478,8 @@ void idEditField::KeyDownEvent( int key ) {
 	}
 
 	// clear autocompletion buffer on normal key input
-	if ( key != K_CAPSLOCK && key != K_ALT && key != K_CTRL && key != K_SHIFT ) {
+	if ( key != K_CAPSLOCK && key != K_ALT && key != K_CTRL && key != K_SHIFT
+	     && key != K_RIGHT_CTRL && key != K_RIGHT_SHIFT ) { // TODO: K_RIGHT_ALT ?
 		ClearAutoComplete();
 	}
 }
@@ -504,7 +505,7 @@ void idEditField::Paste( void ) {
 		CharEvent( cbd[i] );
 	}
 
-	Mem_Free( cbd );
+	Sys_FreeClipboardData( cbd );
 }
 
 /*
diff --git a/neo/framework/FileSystem.cpp b/neo/framework/FileSystem.cpp
index bbe77f0..31b952a 100644
--- a/neo/framework/FileSystem.cpp
+++ b/neo/framework/FileSystem.cpp
@@ -886,10 +886,20 @@ const char *idFileSystemLocal::OSPathToRelativePath( const char *OSPath ) {
 	}
 
 	if ( base ) {
-		s = strstr( base, "/" );
-		if ( !s ) {
-			s = strstr( base, "\\" );
+		// DG: on Windows base might look like "base\\pak008.pk4/script/doom_util.script"
+		//     while on Linux it'll be more like "base/pak008.pk4/script/doom_util.script"
+		//     I /think/ we want to get rid of the bla.pk4 part, at least that's what happens implicitly on Windows
+		//     (I hope these problems don't exist if the file is not from a .pk4, so that case is handled like before)
+		s = strstr( base, ".pk4/" );
+		if ( s != NULL ) {
+			s += 4; // skip ".pk4", but *not* the following '/', that'll be skipped below
+		} else {
+			s = strchr( base, '/' );
+			if ( s == NULL ) {
+				s = strchr( base, '\\' );
+			}
 		}
+
 		if ( s ) {
 			strcpy( relativePath, s + 1 );
 			if ( fs_debug.GetInteger() > 1 ) {
diff --git a/neo/framework/FileSystem.h b/neo/framework/FileSystem.h
index ad1a8de..6857fb8 100644
--- a/neo/framework/FileSystem.h
+++ b/neo/framework/FileSystem.h
@@ -61,7 +61,10 @@ If you have questions concerning this license or the applicable additional terms
 //            => change it (to -1?) or does that break anything?
 static const ID_TIME_T	FILE_NOT_FOUND_TIMESTAMP	= 0xFFFFFFFF;
 static const int		MAX_PURE_PAKS				= 128;
-static const int		MAX_OSPATH					= FILENAME_MAX;
+// DG: https://www.gnu.org/software/libc/manual/html_node/Limits-for-Files.html says
+//     that FILENAME_MAX can be *really* big on some systems and thus is not suitable
+//     for buffer lengths. So limit it to prevent stack overflow/out of memory issues
+static const int		MAX_OSPATH					= (FILENAME_MAX < 32000) ? FILENAME_MAX : 32000;
 
 // modes for OpenFileByMode. used as bit mask internally
 typedef enum {
diff --git a/neo/framework/Game.h b/neo/framework/Game.h
index 8a4c4e1..af64d2c 100644
--- a/neo/framework/Game.h
+++ b/neo/framework/Game.h
@@ -31,6 +31,7 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "idlib/BitMsg.h"
 #include "idlib/Dict.h"
+#include "idlib/containers/StrList.h"
 #include "framework/UsercmdGen.h"
 #include "renderer/RenderWorld.h"
 #include "sound/sound.h"
@@ -232,6 +233,11 @@ enum {
 
 class idEntity;
 class idMD5Anim;
+class idThread;
+class function_t;
+class idProgram;
+class idInterpreter;
+typedef struct prstack_s prstack_t;
 
 // FIXME: this interface needs to be reworked but it properly separates code for the time being
 class idGameEdit {
@@ -294,7 +300,7 @@ public:
 	virtual void				EntitySetColor( idEntity *ent, const idVec3 color );
 
 	// Player methods.
-	virtual bool				PlayerIsValid() const;
+	virtual bool				PlayerIsValid( ) const;
 	virtual void				PlayerGetOrigin( idVec3 &org ) const;
 	virtual void				PlayerGetAxis( idMat3 &axis ) const;
 	virtual void				PlayerGetViewAngles( idAngles &angles ) const;
@@ -310,11 +316,44 @@ public:
 	virtual int					MapGetEntitiesMatchingClassWithString( const char *classname, const char *match, const char *list[], const int max ) const;
 	virtual void				MapRemoveEntity( const char *name ) const;
 	virtual void				MapEntityTranslate( const char *name, const idVec3 &v ) const;
-
 };
 
 extern idGameEdit *				gameEdit;
 
+// In game script Debugging Support
+class idGameEditExt : public idGameEdit {
+public:
+	virtual						~idGameEditExt( void ) { }
+	// IdProgram
+	virtual void				GetLoadedScripts( idStrList ** result );
+	virtual bool				IsLineCode( const char* filename, int linenumber) const;
+	virtual const char *		GetFilenameForStatement( idProgram* program, int index ) const;
+	virtual int					GetLineNumberForStatement( idProgram* program, int index ) const;
+
+	// idInterpreter
+	virtual bool				CheckForBreakPointHit( const idInterpreter* interpreter, const function_t* function1, const function_t* function2, int depth ) const;
+	virtual bool				ReturnedFromFunction( const idProgram* program, const idInterpreter* interpreter, int index ) const;
+	virtual bool				GetRegisterValue( const idInterpreter* interpreter, const char* name, idStr& out, int scopeDepth ) const;
+	virtual const idThread*		GetThread( const idInterpreter* interpreter ) const;
+	virtual int					GetInterpreterCallStackDepth( const idInterpreter* interpreter );
+	virtual const function_t*	GetInterpreterCallStackFunction( const idInterpreter* interpreter, int stackDepth = -1 );
+
+	// IdThread
+	virtual const char *		ThreadGetName( const idThread* thread ) const;
+	virtual int					ThreadGetNum( const idThread* thread ) const;
+	virtual bool				ThreadIsDoneProcessing( const idThread* thread ) const;
+	virtual bool				ThreadIsWaiting( const idThread* thread ) const;
+	virtual bool				ThreadIsDying( const idThread* thread ) const;
+	virtual int					GetTotalScriptThreads( ) const;
+	virtual const idThread*		GetThreadByIndex( int index ) const;
+
+	// MSG helpers
+	virtual void				MSG_WriteThreadInfo( idBitMsg* msg, const idThread* thread, const idInterpreter* interpreter );
+	virtual void				MSG_WriteCallstackFunc( idBitMsg* msg, const prstack_t* stack, const idProgram* program, int instructionPtr );
+	virtual void				MSG_WriteInterpreterInfo( idBitMsg* msg, const idInterpreter* interpreter, const idProgram* program, int instructionPtr );
+	virtual void				MSG_WriteScriptList( idBitMsg* msg );
+};
+
 
 /*
 ===============================================================================
diff --git a/neo/framework/GameCallbacks_local.h b/neo/framework/GameCallbacks_local.h
index 51d9763..58bad7f 100644
--- a/neo/framework/GameCallbacks_local.h
+++ b/neo/framework/GameCallbacks_local.h
@@ -40,9 +40,8 @@ If you have questions concerning this license or the applicable additional terms
 struct idGameCallbacks {
 
 	typedef void (*ReloadImagesCallback)(void* userArg, const idCmdArgs &args);
-	ReloadImagesCallback reloadImagesCB;
-	void*                reloadImagesUserArg;
-
+	ReloadImagesCallback	reloadImagesCB;
+	void*					reloadImagesUserArg;
 
 	idGameCallbacks();
 
diff --git a/neo/framework/KeyInput.cpp b/neo/framework/KeyInput.cpp
index f2d8044..bc5c2e1 100644
--- a/neo/framework/KeyInput.cpp
+++ b/neo/framework/KeyInput.cpp
@@ -63,7 +63,7 @@ static const keyname_t keynames[] =
 	{"RIGHTARROW",		K_RIGHTARROW,		"#str_07026"},
 
 	{"ALT",				K_ALT,				"#str_07027"},
-	{"RIGHTALT",		K_RIGHT_ALT,		"#str_07027"},
+	//{"RIGHTALT",		K_RIGHT_ALT,		"#str_07027"}, // DG: renamed this, see below
 	{"CTRL",			K_CTRL,				"#str_07028"},
 	{"SHIFT",			K_SHIFT,			"#str_07029"},
 
@@ -182,13 +182,18 @@ static const keyname_t keynames[] =
 
 	{"SEMICOLON",		';',				"#str_07129"},	// because a raw semicolon separates commands
 	{"APOSTROPHE",		'\'',				"#str_07130"},	// because a raw apostrophe messes with parsing
+	{"QUOTE",			'"',				""}, // DG: raw quote can't be good either
 
-	{NULL,				0,					NULL}
-};
+	{"R_ALT",			K_RIGHT_ALT,		""}, // DG: renamed this from RIGHTALT so it's shorter (but discernible) in the menu
+	{"R_CTRL",			K_RIGHT_CTRL, 		""}, // DG: added this one
+	{"R_SHIFT",			K_RIGHT_SHIFT,		""}, // DG: added this one
 
+	// TODO: controller stuff
 
+	{NULL,				0,					NULL}
+};
 
-static const int	MAX_KEYS = 256;
+static const int	MAX_KEYS = K_LAST_KEY+1; // DG: was 256, made it more flexible
 
 class idKey {
 public:
@@ -250,6 +255,13 @@ void idKeyInput::ArgCompletion_KeyName( const idCmdArgs &args, void(*callback)(
 	for ( kn = keynames; kn->name; kn++ ) {
 		callback( va( "%s %s", args.Argv( 0 ), kn->name ) );
 	}
+
+	for( int scKey = K_FIRST_SCANCODE; scKey <= K_LAST_SCANCODE; ++scKey ) {
+		const char* scName = Sys_GetScancodeName( scKey );
+		if ( scName != NULL ) {
+			callback( va( "%s %s", args.Argv( 0 ), scName ) );
+		}
+	}
 }
 
 /*
@@ -280,6 +292,15 @@ bool idKeyInput::IsDown( int keynum ) {
 		return false;
 	}
 
+	// DG: K_RIGHT_CTRL/SHIFT should be handled as different keys for bindings
+	//     but the same for keyboard shortcuts in the console and such
+	//     (this function is used for the latter)
+	if ( keynum == K_CTRL ) {
+		return keys[K_CTRL].down || keys[K_RIGHT_CTRL].down;
+	} else if ( keynum == K_SHIFT ) {
+		return keys[K_SHIFT].down || keys[K_RIGHT_SHIFT].down;
+	}
+
 	return keys[keynum].down;
 }
 
@@ -330,6 +351,11 @@ int idKeyInput::StringToKeyNum( const char *str ) {
 		return n1 * 16 + n2;
 	}
 
+	// DG: scancode names start with "SC_"
+	if ( idStr::Icmpn( str, "SC_", 3 ) == 0 ) {
+		return Sys_GetKeynumForScancodeName( str );
+	}
+
 	// scan for a text match
 	for ( kn = keynames; kn->name; kn++ ) {
 		if ( !idStr::Icmp( str, kn->name ) ) {
@@ -346,6 +372,9 @@ idKeyInput::KeyNumToString
 
 Returns a string (either a single ascii char, a K_* name, or a 0x11 hex string) for the
 given keynum.
+
+NOTE: with localized = true, the returned string is only valid until the next call (at least for K_SC_*)!
+      (currently this is no problem)
 ===================
 */
 const char *idKeyInput::KeyNumToString( int keynum, bool localized ) {
@@ -357,7 +386,7 @@ const char *idKeyInput::KeyNumToString( int keynum, bool localized ) {
 		return "<KEY NOT FOUND>";
 	}
 
-	if ( keynum < 0 || keynum > 255 ) {
+	if ( keynum < 0 || keynum >= MAX_KEYS ) {
 		return "<OUT OF RANGE>";
 	}
 
@@ -368,6 +397,18 @@ const char *idKeyInput::KeyNumToString( int keynum, bool localized ) {
 		return tinystr;
 	}
 
+	if ( keynum >= K_FIRST_SCANCODE && keynum <= K_LAST_SCANCODE ) {
+		const char* scName = NULL;
+		if ( localized ) {
+			scName = Sys_GetLocalizedScancodeName( keynum );
+		} else {
+			scName = Sys_GetScancodeName( keynum );
+		}
+		if ( scName != NULL ) {
+			return scName;
+		}
+	}
+
 	// check for a key string
 	for ( kn = keynames; kn->name; kn++ ) {
 		if ( keynum == kn->keynum ) {
@@ -709,7 +750,7 @@ void idKeyInput::PreliminaryKeyEvent( int keynum, bool down ) {
 	keys[keynum].down = down;
 
 #ifdef ID_DOOM_LEGACY
-	if ( down ) {
+	if ( down && keynum < 127 ) { // DG: only ASCII keys are of interest here
 		lastKeys[ 0 + ( lastKeyIndex & 15 )] = keynum;
 		lastKeys[16 + ( lastKeyIndex & 15 )] = keynum;
 		lastKeyIndex = ( lastKeyIndex + 1 ) & 15;
diff --git a/neo/framework/KeyInput.h b/neo/framework/KeyInput.h
index 6bff334..77179d3 100644
--- a/neo/framework/KeyInput.h
+++ b/neo/framework/KeyInput.h
@@ -197,9 +197,84 @@ typedef enum {
 
 	K_PRINT_SCR	= 252,	// SysRq / PrintScr
 	K_RIGHT_ALT = 253,	// used by some languages as "Alt-Gr"
-	K_LAST_KEY  = 254	// this better be < 256!
+
+	// DG: added the following two
+	K_RIGHT_CTRL = 254,
+	K_RIGHT_SHIFT = 255,
+
+	// DG: map all relevant scancodes from SDL to K_SC_* (taken from Yamagi Quake II)
+	// (relevant are ones that are likely to be keyboardlayout-dependent,
+	//  i.e. printable characters of sorts, *not* Ctrl, Alt, F1, Del, ...)
+	K_FIRST_SCANCODE = 256,
+
+	// !!! NOTE: if you add a scancode here, make sure to also add it to   !!!
+	// !!!       scancodemappings[] in sys/events.cpp (and preserve order) !!!
+	K_SC_A = K_FIRST_SCANCODE,
+	K_SC_B,
+	K_SC_C,
+	K_SC_D,
+	K_SC_E,
+	K_SC_F,
+	K_SC_G,
+	K_SC_H,
+	K_SC_I,
+	K_SC_J,
+	K_SC_K,
+	K_SC_L,
+	K_SC_M,
+	K_SC_N,
+	K_SC_O,
+	K_SC_P,
+	K_SC_Q,
+	K_SC_R,
+	K_SC_S,
+	K_SC_T,
+	K_SC_U,
+	K_SC_V,
+	K_SC_W,
+	K_SC_X,
+	K_SC_Y,
+	K_SC_Z,
+	// leaving out SDL_SCANCODE_1 ... _0, we handle them separately already
+	// also return, escape, backspace, tab, space, already handled as keycodes
+	K_SC_MINUS,
+	K_SC_EQUALS,
+	K_SC_LEFTBRACKET,
+	K_SC_RIGHTBRACKET,
+	K_SC_BACKSLASH,
+	K_SC_NONUSHASH,
+	K_SC_SEMICOLON,
+	K_SC_APOSTROPHE,
+	K_SC_GRAVE,
+	K_SC_COMMA,
+	K_SC_PERIOD,
+	K_SC_SLASH,
+	// leaving out lots of keys incl. from keypad, we already handle them as normal keys
+	K_SC_NONUSBACKSLASH,
+	K_SC_INTERNATIONAL1, /**< used on Asian keyboards, see footnotes in USB doc */
+	K_SC_INTERNATIONAL2,
+	K_SC_INTERNATIONAL3, /**< Yen */
+	K_SC_INTERNATIONAL4,
+	K_SC_INTERNATIONAL5,
+	K_SC_INTERNATIONAL6,
+	K_SC_INTERNATIONAL7,
+	K_SC_INTERNATIONAL8,
+	K_SC_INTERNATIONAL9,
+	K_SC_THOUSANDSSEPARATOR,
+	K_SC_DECIMALSEPARATOR,
+	K_SC_CURRENCYUNIT,
+	K_SC_CURRENCYSUBUNIT,
+
+	K_LAST_SCANCODE = K_SC_CURRENCYSUBUNIT, // TODO: keep up to date!
+
+	K_CONSOLE, // special keycode used for the "console key" and only to open/close the console (not bindable)
+
+	// FIXME: maybe move everything joystick related here
+
+	K_LAST_KEY // DG: this said "this better be < 256!"; I hope I fixed all places in code assuming this..
 } keyNum_t;
 
+enum { K_NUM_SCANCODES = K_LAST_SCANCODE - K_FIRST_SCANCODE + 1 };
 
 class idKeyInput {
 public:
diff --git a/neo/framework/Licensee.h b/neo/framework/Licensee.h
index c1594e3..812f4c2 100644
--- a/neo/framework/Licensee.h
+++ b/neo/framework/Licensee.h
@@ -41,12 +41,12 @@ If you have questions concerning this license or the applicable additional terms
 #define GAME_NAME						"dhewm 3"		// appears on window titles and errors
 #endif
 
-#define ENGINE_VERSION					"dhewm3 1.5.1"	// printed in console
+#define ENGINE_VERSION					"dhewm3 1.5.2pre"	// printed in console
 
 #ifdef ID_REPRODUCIBLE_BUILD
 	// for reproducible builds we hardcode values that would otherwise come from __DATE__ and __TIME__
 	// NOTE: remember to update esp. the date for (pre-) releases and RCs and the like
-	#define ID__DATE__  "Mar 13 2021"
+	#define ID__DATE__  "Apr 07 2021"
 	#define ID__TIME__  "13:37:42"
 
 #else // not reproducible build, use __DATE__ and __TIME__ macros
diff --git a/neo/framework/Session.cpp b/neo/framework/Session.cpp
index 697591e..9d5bcf0 100644
--- a/neo/framework/Session.cpp
+++ b/neo/framework/Session.cpp
@@ -58,6 +58,9 @@ idCVar	idSessionLocal::com_aviDemoTics( "com_aviDemoTics", "2", CVAR_SYSTEM | CV
 idCVar	idSessionLocal::com_wipeSeconds( "com_wipeSeconds", "1", CVAR_SYSTEM, "" );
 idCVar	idSessionLocal::com_guid( "com_guid", "", CVAR_SYSTEM | CVAR_ARCHIVE | CVAR_ROM, "" );
 
+idCVar	idSessionLocal::com_numQuicksaves( "com_numQuicksaves", "4", CVAR_SYSTEM|CVAR_ARCHIVE|CVAR_INTEGER,
+                                           "number of quicksaves to keep before overwriting the oldest", 1, 99 );
+
 idSessionLocal		sessLocal;
 idSession			*session = &sessLocal;
 
@@ -1458,7 +1461,7 @@ void idSessionLocal::LoadLoadingGui( const char *mapName ) {
 	stripped.StripPath();
 
 	char guiMap[ MAX_STRING_CHARS ];
-	strncpy( guiMap, va( "guis/map/%s.gui", stripped.c_str() ), MAX_STRING_CHARS );
+	idStr::Copynz( guiMap, va( "guis/map/%s.gui", stripped.c_str() ), MAX_STRING_CHARS );
 	// give the gamecode a chance to override
 	game->GetMapLoadingGUI( guiMap );
 
@@ -1753,8 +1756,7 @@ LoadGame_f
 void LoadGame_f( const idCmdArgs &args ) {
 	console->Close();
 	if ( args.Argc() < 2 || idStr::Icmp(args.Argv(1), "quick" ) == 0 ) {
-		idStr saveName = common->GetLanguageDict()->GetString( "#str_07178" );
-		sessLocal.LoadGame( saveName );
+		sessLocal.QuickLoad();
 	} else {
 		sessLocal.LoadGame( args.Argv(1) );
 	}
@@ -1767,10 +1769,7 @@ SaveGame_f
 */
 void SaveGame_f( const idCmdArgs &args ) {
 	if ( args.Argc() < 2 || idStr::Icmp( args.Argv(1), "quick" ) == 0 ) {
-		idStr saveName = common->GetLanguageDict()->GetString( "#str_07178" );
-		if ( sessLocal.SaveGame( saveName ) ) {
-			common->Printf( "%s\n", saveName.c_str() );
-		}
+		sessLocal.QuickSave();
 	} else {
 		if ( sessLocal.SaveGame( args.Argv(1) ) ) {
 			common->Printf( "Saved %s\n", args.Argv(1) );
@@ -2161,6 +2160,104 @@ bool idSessionLocal::LoadGame( const char *saveName ) {
 #endif
 }
 
+bool idSessionLocal::QuickSave()
+{
+	idStr saveName = common->GetLanguageDict()->GetString( "#str_07178" );
+
+	idStr saveFilePathBase = saveName;
+	ScrubSaveGameFileName( saveFilePathBase );
+	saveFilePathBase = "savegames/" + saveFilePathBase;
+
+	const char* game = cvarSystem->GetCVarString( "fs_game" );
+	if ( game != NULL && game[0] == '\0' ) {
+		game = NULL;
+	}
+
+	const int maxNum = com_numQuicksaves.GetInteger();
+	int indexToUse = 1;
+	ID_TIME_T oldestTime = 0;
+	for( int i = 1; i <= maxNum; ++i ) {
+		idStr saveFilePath = saveFilePathBase;
+		if ( i > 1 ) {
+			// the first one is just called "QuickSave" without a number, like before.
+			// the others are called "QuickSave2" "QuickSave3" etc
+			saveFilePath += i;
+		}
+		saveFilePath.SetFileExtension( ".save" );
+
+		idFile *f = fileSystem->OpenFileRead( saveFilePath, true, game );
+		if ( f == NULL ) {
+			// this savegame doesn't exist yet => we can use this index for the name
+			indexToUse = i;
+			break;
+		} else {
+			ID_TIME_T ts = f->Timestamp();
+			assert( ts != 0 );
+			if ( ts < oldestTime || oldestTime == 0 ) {
+				// this is the oldest quicksave we found so far => a candidate to be overwritten
+				indexToUse = i;
+				oldestTime = ts;
+			}
+			delete f;
+		}
+	}
+
+	if ( indexToUse > 1 ) {
+		saveName += indexToUse;
+	}
+
+	if ( SaveGame( saveName ) ) {
+		common->Printf( "%s\n", saveName.c_str() );
+		return true;
+	}
+	return false;
+}
+
+bool idSessionLocal::QuickLoad()
+{
+	idStr saveName = common->GetLanguageDict()->GetString( "#str_07178" );
+
+	idStr saveFilePathBase = saveName;
+	ScrubSaveGameFileName( saveFilePathBase );
+	saveFilePathBase = "savegames/" + saveFilePathBase;
+
+	const char* game = cvarSystem->GetCVarString( "fs_game" );
+	if ( game != NULL && game[0] == '\0' ) {
+		game = NULL;
+	}
+
+	// find the newest QuickSave (or QuickSave2, QuickSave3, ...)
+	const int maxNum = com_numQuicksaves.GetInteger();
+	int indexToUse = 1;
+	ID_TIME_T newestTime = 0;
+	for( int i = 1; i <= maxNum; ++i ) {
+		idStr saveFilePath = saveFilePathBase;
+		if ( i > 1 ) {
+			// the first one is just called "QuickSave" without a number, like before.
+			// the others are called "QuickSave2" "QuickSave3" etc
+			saveFilePath += i;
+		}
+		saveFilePath.SetFileExtension( ".save" );
+
+		idFile *f = fileSystem->OpenFileRead( saveFilePath, true, game );
+		if ( f != NULL ) {
+			ID_TIME_T ts = f->Timestamp();
+			assert( ts != 0 );
+			if ( ts > newestTime ) {
+				indexToUse = i;
+				newestTime = ts;
+			}
+			delete f;
+		}
+	}
+
+	if ( indexToUse > 1 ) {
+		saveName += indexToUse;
+	}
+
+	return sessLocal.LoadGame( saveName );
+}
+
 /*
 ===============
 idSessionLocal::ProcessEvent
@@ -2547,6 +2644,7 @@ void idSessionLocal::Frame() {
 		return;
 	}
 
+#if 0 // handled via Sys_GenerateEvents() -> handleMouseGrab()
 	// if the console is down, we don't need to hold
 	// the mouse cursor
 	if ( console->Active() || com_editorActive ) {
@@ -2554,6 +2652,7 @@ void idSessionLocal::Frame() {
 	} else {
 		Sys_GrabMouseCursor( true );
 	}
+#endif
 
 	// save the screenshot and audio from the last draw if needed
 	if ( aviCaptureMode ) {
diff --git a/neo/framework/Session_local.h b/neo/framework/Session_local.h
index febb154..45acd09 100644
--- a/neo/framework/Session_local.h
+++ b/neo/framework/Session_local.h
@@ -166,6 +166,9 @@ public:
 	// DG: added saveFileName so we can set a sensible filename for autosaves (see comment in MoveToNewMap())
 	bool				SaveGame(const char *saveName, bool autosave = false, const char* saveFileName = NULL);
 
+	bool				QuickSave();
+	bool				QuickLoad();
+
 	const char			*GetAuthMsg( void );
 
 	//=====================================
@@ -182,6 +185,7 @@ public:
 	static idCVar		com_aviDemoTics;
 	static idCVar		com_wipeSeconds;
 	static idCVar		com_guid;
+	static idCVar		com_numQuicksaves;
 
 	static idCVar		gui_configServerRate;
 
diff --git a/neo/framework/Session_menu.cpp b/neo/framework/Session_menu.cpp
index 1e6618f..b830c1f 100644
--- a/neo/framework/Session_menu.cpp
+++ b/neo/framework/Session_menu.cpp
@@ -388,6 +388,8 @@ bool idSessionLocal::HandleSaveGameMenuCommand( idCmdArgs &args, int &icmd ) {
 
 			sessLocal.SaveGame( saveGameName );
 			SetSaveGameGuiVars( );
+			// DG: select item 0 => select savegame just created (should be on top as it's newest)
+			guiActive->SetStateInt( "loadgame_sel_0", 0 );
 			guiActive->StateChanged( com_frameTime );
 		}
 		return true;
diff --git a/neo/framework/async/AsyncNetwork.cpp b/neo/framework/async/AsyncNetwork.cpp
index f44f504..6f8a0cf 100644
--- a/neo/framework/async/AsyncNetwork.cpp
+++ b/neo/framework/async/AsyncNetwork.cpp
@@ -172,10 +172,8 @@ idAsyncNetwork::RunFrame
 */
 void idAsyncNetwork::RunFrame( void ) {
 	if ( console->Active() ) {
-		Sys_GrabMouseCursor( false );
 		usercmdGen->InhibitUsercmd( INHIBIT_ASYNC, true );
 	} else {
-		Sys_GrabMouseCursor( true );
 		usercmdGen->InhibitUsercmd( INHIBIT_ASYNC, false );
 	}
 	client.RunFrame();
diff --git a/neo/game/GameEdit.cpp b/neo/game/GameEdit.cpp
index 0c637bd..57e7eb2 100644
--- a/neo/game/GameEdit.cpp
+++ b/neo/game/GameEdit.cpp
@@ -670,7 +670,7 @@ void idEditEntities::DisplayEntities( void ) {
 ===============================================================================
 */
 
-idGameEdit			gameEditLocal;
+idGameEditExt		gameEditLocal;
 idGameEdit *		gameEdit = &gameEditLocal;
 
 
@@ -1146,3 +1146,64 @@ void idGameEdit::MapEntityTranslate( const char *name, const idVec3 &v ) const {
 		}
 	}
 }
+
+/***********************************************************************
+
+  Debugger
+
+***********************************************************************/
+
+bool idGameEditExt::IsLineCode(const char* filename, int linenumber) const
+{
+	idStr fileStr;
+	idProgram* program = &gameLocal.program;
+	for (int i = 0; i < program->NumStatements(); i++)
+	{
+		fileStr = program->GetFilename(program->GetStatement(i).file);
+		fileStr.BackSlashesToSlashes();
+
+		if (strcmp(filename, fileStr.c_str()) == 0
+			&& program->GetStatement(i).linenumber == linenumber
+			)
+		{
+			return true;
+		}
+	}
+	return false;
+}
+
+void idGameEditExt::GetLoadedScripts( idStrList** result )
+{
+	(*result)->Clear();
+	idProgram* program = &gameLocal.program;
+
+	for (int i = 0; i < program->NumFilenames(); i++)
+	{
+		(*result)->AddUnique( idStr(program->GetFilename( i )) );
+	}
+}
+
+void idGameEditExt::MSG_WriteScriptList( idBitMsg* msg)
+{
+	idProgram* program = &gameLocal.program;
+
+	msg->WriteInt( program->NumFilenames() );
+	for (int i = 0; i < program->NumFilenames(); i++)
+	{
+		idStr file = program->GetFilename(i);
+		//fix this. it seams that scripts triggered by the runtime are stored with a wrong path
+		//the use // instead of '\'
+		file.BackSlashesToSlashes();
+		msg->WriteString(file);
+	}
+}
+
+const char*idGameEditExt::GetFilenameForStatement(idProgram* program, int index) const
+{
+	return program->GetFilenameForStatement(index);
+}
+
+int idGameEditExt::GetLineNumberForStatement(idProgram* program, int index) const
+{
+	return program->GetLineNumberForStatement(index);
+}
\ No newline at end of file
diff --git a/neo/game/Game_local.cpp b/neo/game/Game_local.cpp
index ef820d6..99f7e0f 100644
--- a/neo/game/Game_local.cpp
+++ b/neo/game/Game_local.cpp
@@ -270,6 +270,14 @@ bool IsDoom3DemoVersion()
 	return ret;
 }
 
+static bool ( *updateDebuggerFnPtr )( idInterpreter *interpreter, idProgram *program, int instructionPointer ) = NULL;
+bool updateGameDebugger( idInterpreter *interpreter, idProgram *program, int instructionPointer ) {
+	bool ret = false;
+	if ( interpreter != NULL && program != NULL ) {
+		ret = updateDebuggerFnPtr ? updateDebuggerFnPtr( interpreter, program, instructionPointer ) : false;
+	}
+	return ret;
+}
 
 
 /*
@@ -352,6 +360,8 @@ void idGameLocal::Init( void ) {
 
 	// DG: hack to support the Demo version of Doom3
 	common->GetAdditionalFunction(idCommon::FT_IsDemo, (idCommon::FunctionPointer*)&isDemoFnPtr, NULL);
+	//debugger support
+	common->GetAdditionalFunction(idCommon::FT_UpdateDebugger,(idCommon::FunctionPointer*) &updateDebuggerFnPtr,NULL);
 }
 
 /*
@@ -1192,7 +1202,7 @@ void idGameLocal::MapPopulate( void ) {
 idGameLocal::InitFromNewMap
 ===================
 */
-void idGameLocal::InitFromNewMap( const char *mapName, idRenderWorld *renderWorld, idSoundWorld *soundWorld, bool isServer, bool isClient, int randseed ) {
+void idGameLocal::InitFromNewMap( const char *mapName, idRenderWorld *renderWorld, idSoundWorld *soundWorld, bool isServer, bool isClient, int randseed) {
 
 	this->isServer = isServer;
 	this->isClient = isClient;
@@ -2216,13 +2226,13 @@ idGameLocal::RunFrame
 ================
 */
 gameReturn_t idGameLocal::RunFrame( const usercmd_t *clientCmds ) {
-	idEntity *	ent;
-	int			num;
-	float		ms;
-	idTimer		timer_think, timer_events, timer_singlethink;
-	gameReturn_t ret;
-	idPlayer	*player;
-	const renderView_t *view;
+	idEntity *			ent;
+	int					num;
+	float				ms;
+	idTimer				timer_think, timer_events, timer_singlethink;
+	gameReturn_t		ret;
+	idPlayer			*player;
+	const renderView_t	*view;
 
 #ifdef _DEBUG
 	if ( isMultiplayer ) {
@@ -2393,7 +2403,7 @@ gameReturn_t idGameLocal::RunFrame( const usercmd_t *clientCmds ) {
 
 		// see if a target_sessionCommand has forced a changelevel
 		if ( sessionCommand.Length() ) {
-			strncpy( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
+			idStr::Copynz( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
 			break;
 		}
 
@@ -4358,7 +4368,7 @@ idGameLocal::GetBestGameType
 ============
 */
 void idGameLocal::GetBestGameType( const char* map, const char* gametype, char buf[ MAX_STRING_CHARS ] ) {
-	strncpy( buf, gametype, MAX_STRING_CHARS );
+	idStr::Copynz( buf, gametype, MAX_STRING_CHARS );
 	buf[ MAX_STRING_CHARS - 1 ] = '\0';
 }
 
diff --git a/neo/game/Game_network.cpp b/neo/game/Game_network.cpp
index c26d906..d7e2268 100644
--- a/neo/game/Game_network.cpp
+++ b/neo/game/Game_network.cpp
@@ -1521,7 +1521,7 @@ gameReturn_t idGameLocal::ClientPrediction( int clientNum, const usercmd_t *clie
 	}
 
 	if ( sessionCommand.Length() ) {
-		strncpy( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
+		idStr::Copynz( ret.sessionCommand, sessionCommand, sizeof( ret.sessionCommand ) );
 	}
 	return ret;
 }
diff --git a/neo/game/gamesys/TypeInfo.cpp b/neo/game/gamesys/TypeInfo.cpp
index 2742c68..40f55e8 100644
--- a/neo/game/gamesys/TypeInfo.cpp
+++ b/neo/game/gamesys/TypeInfo.cpp
@@ -570,9 +570,18 @@ int idTypeInfoTools::WriteVariable_r( const void *varPtr, const char *varName, c
 	}
 
 	// if this is a pointer
+
+#if D3_SIZEOFPTR == 4
+	const uintptr_t uninitPtr = (uintptr_t)0xcdcdcdcdUL;
+#elif D3_SIZEOFPTR == 8
+	const uintptr_t uninitPtr = (uintptr_t)0xcdcdcdcdcdcdcdcdULL;
+#else
+	#error "Unexpected pointer size"
+#endif
+
 	isPointer = 0;
 	for ( i = typeString.Length(); i > 0 && typeString[i - 1] == '*'; i -= 2 ) {
-		if ( varPtr == (void *)0xcdcdcdcd || ( varPtr != NULL && *((unsigned int *)varPtr) == 0xcdcdcdcd ) ) {
+		if ( varPtr == (void*)uninitPtr || ( varPtr != NULL && *((unsigned int *)varPtr) == 0xcdcdcdcd ) ) {
 			common->Warning( "%s%s::%s%s references uninitialized memory", prefix, scope, varName, "" );
 			return typeSize;
 		}
diff --git a/neo/game/script/Script_Compiler.cpp b/neo/game/script/Script_Compiler.cpp
index 196d463..5ee0c77 100644
--- a/neo/game/script/Script_Compiler.cpp
+++ b/neo/game/script/Script_Compiler.cpp
@@ -28,6 +28,7 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "sys/platform.h"
 #include "idlib/Timer.h"
+#include "framework/FileSystem.h"
 
 #include "script/Script_Thread.h"
 #include "Game_local.h"
@@ -2620,6 +2621,8 @@ void idCompiler::CompileFile( const char *text, const char *filename, bool toCon
 
 	compile_time.Start();
 
+	idStr origFileName = filename; // DG: filename pointer might become invalid when calling NextToken() below
+
 	scope				= &def_namespace;
 	basetype			= NULL;
 	callthread			= false;
@@ -2687,6 +2690,11 @@ void idCompiler::CompileFile( const char *text, const char *filename, bool toCon
 
 	compile_time.Stop();
 	if ( !toConsole ) {
-		gameLocal.Printf( "Compiled '%s': %u ms\n", filename, compile_time.Milliseconds() );
+		// DG: filename can be overwritten by NextToken() (via gameLocal.program.GetFilenum()), so
+		//     use a copy, origFileName, that's still valid here. Furthermore, the path is nonsense,
+		//     as idProgram::CompileText() called fileSystem->RelativePathToOSPath() on it
+		//     which does not return the *actual* full path of that file but invents one,
+		//     so revert that to the relative filename which at least isn't misleading
+		gameLocal.Printf( "Compiled '%s': %u ms\n", fileSystem->OSPathToRelativePath(origFileName), compile_time.Milliseconds() );
 	}
 }
diff --git a/neo/game/script/Script_Interpreter.cpp b/neo/game/script/Script_Interpreter.cpp
index b6d044c..11fd06a 100644
--- a/neo/game/script/Script_Interpreter.cpp
+++ b/neo/game/script/Script_Interpreter.cpp
@@ -33,6 +33,11 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "script/Script_Interpreter.h"
 
+#include "framework/FileSystem.h"
+
+// HvG: Debugger support
+extern bool updateGameDebugger( idInterpreter *interpreter, idProgram *program, int instructionPointer );
+
 /*
 ================
 idInterpreter::idInterpreter()
@@ -183,7 +188,6 @@ idInterpreter::GetRegisterValue
 Returns a string representation of the value of the register.  This is
 used primarily for the debugger and debugging
 
-//FIXME:  This is pretty much wrong.  won't access data in most situations.
 ================
 */
 bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDepth ) {
@@ -191,17 +195,17 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 	idVarDef		*d;
 	char			funcObject[ 1024 ];
 	char			*funcName;
-	const idVarDef	*scope;
+	const idVarDef	*scope = NULL;
+	const idVarDef	*scopeObj;
 	const idTypeDef	*field;
-	const idScriptObject *obj;
 	const function_t *func;
 
 	out.Empty();
-
+	
 	if ( scopeDepth == -1 ) {
 		scopeDepth = callStackDepth;
-	}
-
+	}	
+	
 	if ( scopeDepth == callStackDepth ) {
 		func = currentFunction;
 	} else {
@@ -215,35 +219,44 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 	funcName = strstr( funcObject, "::" );
 	if ( funcName ) {
 		*funcName = '\0';
-		scope = gameLocal.program.GetDef( NULL, funcObject, &def_namespace );
-		funcName += 2;
+		scopeObj = gameLocal.program.GetDef( NULL, funcObject, &def_namespace );
+		funcName += 2;				
+		if ( scopeObj )
+		{
+			scope = gameLocal.program.GetDef( NULL, funcName, scopeObj );
+		}
 	} else {
 		funcName = funcObject;
-		scope = &def_namespace;
+		scope = gameLocal.program.GetDef( NULL, func->Name(), &def_namespace );
+		scopeObj = NULL;
 	}
 
-	// Get the function from the object
-	d = gameLocal.program.GetDef( NULL, funcName, scope );
-	if ( !d ) {
+	if ( !scope )
+	{
 		return false;
 	}
 
-	// Get the variable itself and check various namespaces
-	d = gameLocal.program.GetDef( NULL, name, d );
-	if ( !d ) {
-		if ( scope == &def_namespace ) {
-			return false;
-		}
-
-		d = gameLocal.program.GetDef( NULL, name, scope );
-		if ( !d ) {
-			d = gameLocal.program.GetDef( NULL, name, &def_namespace );
-			if ( !d ) {
-				return false;
+	d = gameLocal.program.GetDef( NULL, name, scope );
+	
+	// Check the objects for it if it wasnt local to the function
+	if ( !d )
+	{
+		for ( ; scopeObj && scopeObj->TypeDef()->SuperClass(); scopeObj = scopeObj->TypeDef()->SuperClass()->def )
+		{
+			d = gameLocal.program.GetDef( NULL, name, scopeObj );
+			if ( d )
+			{
+				break;
 			}
 		}
-	}
+	}	
 
+	if ( !d )
+	{
+		out = "???";
+		return false;
+	}
+	
 	reg = GetVariable( d );
 	switch( d->Type() ) {
 	case ev_float:
@@ -256,7 +269,7 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 		break;
 
 	case ev_vector:
-		if ( reg.vectorPtr ) {
+		if ( reg.vectorPtr ) {				
 			out = va( "%g,%g,%g", reg.vectorPtr->x, reg.vectorPtr->y, reg.vectorPtr->z );
 		} else {
 			out = "0,0,0";
@@ -274,30 +287,55 @@ bool idInterpreter::GetRegisterValue( const char *name, idStr &out, int scopeDep
 		break;
 
 	case ev_field:
+	{
+		idEntity*		entity;			
+		idScriptObject*	obj;
+		
 		if ( scope == &def_namespace ) {
 			// should never happen, but handle it safely anyway
 			return false;
 		}
 
-		field = scope->TypeDef()->GetParmType( reg.ptrOffset )->FieldType();
-		obj   = *reinterpret_cast<const idScriptObject **>( &localstack[ callStack[ callStackDepth ].stackbase ] );
-		if ( !field || !obj ) {
+		field  = d->TypeDef()->FieldType();
+		entity = GetEntity ( *((int*)&localstack[ localstackBase ]) );
+		if ( !entity || !field )
+		{
 			return false;
 		}
 
+		obj = &entity->scriptObject;
+		if ( !obj ) {
+			return false;
+		}
+		
 		switch ( field->Type() ) {
-		case ev_boolean:
-			out = va( "%d", *( reinterpret_cast<int *>( &obj->data[ reg.ptrOffset ] ) ) );
-			return true;
-
-		case ev_float:
-			out = va( "%g", *( reinterpret_cast<float *>( &obj->data[ reg.ptrOffset ] ) ) );
-			return true;
+			case ev_boolean:
+				out = va( "%d", *( reinterpret_cast<int *>( &obj->data[ reg.ptrOffset ] ) ) );
+				return true;
+
+			case ev_float:
+				out = va( "%g", *( reinterpret_cast<float *>( &obj->data[ reg.ptrOffset ] ) ) );
+				return true;
+				
+			case ev_string:	{
+				const char* str;
+				str = reinterpret_cast<const char*>( &obj->data[ reg.ptrOffset ] );
+				if ( !str ) {
+					out = "\"\"";
+				} else {
+					out  = "\"";
+					out += str;			
+					out += "\"";
+				}
+				return true;
+			}
 
-		default:
-			return false;
+			default:
+				return false;
 		}
+		
 		break;
+	}
 
 	case ev_string:
 		if ( reg.stringPtr ) {
@@ -969,6 +1007,19 @@ bool idInterpreter::Execute( void ) {
 		// next statement
 		st = &gameLocal.program.GetStatement( instructionPointer );
 
+		if ( !updateGameDebugger( this, &gameLocal.program, instructionPointer )
+			&& g_debugScript.GetBool( ) ) 
+		{
+			static int lastLineNumber = -1;
+			if ( lastLineNumber != gameLocal.program.GetStatement ( instructionPointer ).linenumber ) {				
+				gameLocal.Printf ( "%s (%d)\n", 
+					gameLocal.program.GetFilename ( gameLocal.program.GetStatement ( instructionPointer ).file ),
+					gameLocal.program.GetStatement ( instructionPointer ).linenumber
+					);
+				lastLineNumber = gameLocal.program.GetStatement ( instructionPointer ).linenumber;
+			}
+		}
+
 		switch( st->op ) {
 		case OP_RETURN:
 			LeaveFunction( st->a );
@@ -1833,3 +1884,98 @@ bool idInterpreter::Execute( void ) {
 
 	return threadDying;
 }
+
+bool idGameEditExt::CheckForBreakPointHit(const idInterpreter* interpreter, const function_t* function1, const function_t* function2, int depth) const
+{
+	return ( ( interpreter->GetCurrentFunction ( ) == function1 ||
+			   interpreter->GetCurrentFunction ( ) == function2)&&
+			 ( interpreter->GetCallstackDepth ( )  <= depth) );
+}
+
+bool idGameEditExt::ReturnedFromFunction(const idProgram* program, const idInterpreter* interpreter, int index) const
+{
+
+	return ( const_cast<idProgram*>(program)->GetStatement(index).op == OP_RETURN && interpreter->GetCallstackDepth ( ) <= 1 );
+}
+
+bool idGameEditExt::GetRegisterValue(const idInterpreter* interpreter, const char* name, idStr& out, int scopeDepth) const
+{
+	return const_cast<idInterpreter*>(interpreter)->GetRegisterValue(name, out, scopeDepth);
+}
+
+const idThread*idGameEditExt::GetThread(const idInterpreter* interpreter) const
+{
+	return interpreter->GetThread();
+}
+
+void idGameEditExt::MSG_WriteCallstackFunc(idBitMsg* msg, const prstack_t* stack, const idProgram * program, int instructionPtr)
+{
+	const statement_t*	st;
+	const function_t*	func;
+
+	func  = stack->f;
+
+	// If the function is unknown then just fill in with default data.
+	if ( !func )
+	{
+		msg->WriteString ( "<UNKNOWN>" );
+		msg->WriteString ( "<UNKNOWN>" );
+		msg->WriteInt ( 0 );
+		return;
+	}
+	else
+	{
+		msg->WriteString ( va("%s(  )", func->Name() ) );
+	}
+
+	if (stack->s == -1) //this is a fake stack created by debugger, use intruction pointer for retrieval.
+		st = &const_cast<idProgram*>( program )->GetStatement( instructionPtr );
+	else // Use the calling statement as the filename and linenumber where the call was made from		
+		st = &const_cast<idProgram*>( program )->GetStatement ( stack->s );
+
+	if ( st )
+	{
+		idStr qpath = const_cast<idProgram*>( program )->GetFilename( st->file );
+		if (idStr::FindChar( qpath, ':' ) != -1)
+			qpath = fileSystem->OSPathToRelativePath( qpath.c_str() );
+		qpath.BackSlashesToSlashes ( );
+		msg->WriteString( qpath );
+		msg->WriteInt( st->linenumber );
+	}
+	else 
+	{
+		msg->WriteString ( "<UNKNOWN>" );
+		msg->WriteInt ( 0 );
+	}
+}
+
+void idGameEditExt::MSG_WriteInterpreterInfo(idBitMsg* msg, const idInterpreter* interpreter, const idProgram* program, int instructionPtr)
+{
+	int			i;
+	prstack_s	temp;
+
+	msg->WriteShort((int)interpreter->GetCallstackDepth( ) );
+
+	// write out the current function
+	temp.f = interpreter->GetCurrentFunction( );
+	temp.s = -1;
+	temp.stackbase = 0;
+	MSG_WriteCallstackFunc( msg, &temp, program, instructionPtr);
+
+	// Run through all of the callstack and write each to the msg
+	for (i = interpreter->GetCallstackDepth() - 1; i > 0; i--)
+	{
+		MSG_WriteCallstackFunc( msg, interpreter->GetCallstack( ) + i, program, instructionPtr);
+	}
+}
+
+
+int idGameEditExt::GetInterpreterCallStackDepth(const idInterpreter* interpreter)
+{
+	return interpreter->GetCallstackDepth();
+}
+
+const function_t*idGameEditExt::GetInterpreterCallStackFunction( const idInterpreter* interpreter, int stackDepth/* = -1*/)
+{
+	return interpreter->GetCallstack( )[ stackDepth > -1 ? stackDepth :interpreter->GetCallstackDepth( ) ].f;
+}
\ No newline at end of file
diff --git a/neo/game/script/Script_Thread.cpp b/neo/game/script/Script_Thread.cpp
index 58daa5d..546ede9 100644
--- a/neo/game/script/Script_Thread.cpp
+++ b/neo/game/script/Script_Thread.cpp
@@ -28,11 +28,11 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "sys/platform.h"
 
-#include "gamesys/SysCvar.h"
-#include "Player.h"
-#include "Camera.h"
+#include "game/gamesys/SysCvar.h"
+#include "game/Player.h"
+#include "game/Camera.h"
 
-#include "script/Script_Thread.h"
+#include "Script_Thread.h"
 
 const idEventDef EV_Thread_Execute( "<execute>", NULL );
 const idEventDef EV_Thread_SetCallback( "<script_setcallback>", NULL );
@@ -1841,3 +1841,49 @@ void idThread::Event_InfluenceActive( void ) {
 		idThread::ReturnInt( false );
 	}
 }
+
+int idGameEditExt::ThreadGetNum(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->GetThreadNum();
+}
+
+const char*idGameEditExt::ThreadGetName(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->GetThreadName();
+}
+
+int	idGameEditExt::GetTotalScriptThreads() const
+{
+	return idThread::GetThreads().Num();
+}
+
+const idThread*idGameEditExt::GetThreadByIndex(int index) const
+{
+	return idThread::GetThreads()[index];
+}
+
+bool idGameEditExt::ThreadIsDoneProcessing(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->IsDoneProcessing();
+}
+
+bool idGameEditExt::ThreadIsWaiting(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->IsWaiting();
+}
+
+bool idGameEditExt::ThreadIsDying(const idThread* thread) const
+{
+	return const_cast<idThread*>(thread)->IsDying();
+}
+
+void idGameEditExt::MSG_WriteThreadInfo(idBitMsg* msg, const idThread* thread, const idInterpreter* interpreter)
+{
+	msg->WriteString(const_cast<idThread*>(thread)->GetThreadName());
+	msg->WriteInt(const_cast<idThread*>(thread)->GetThreadNum());
+
+	msg->WriteBits((int)(thread == interpreter->GetThread()), 1);
+	msg->WriteBits((int)const_cast<idThread*>(thread)->IsDoneProcessing(), 1);
+	msg->WriteBits((int)const_cast<idThread*>(thread)->IsWaiting(), 1);
+	msg->WriteBits((int)const_cast<idThread*>(thread)->IsDying(), 1);
+}
\ No newline at end of file
diff --git a/neo/idlib/BitMsg.h b/neo/idlib/BitMsg.h
index e853bf7..f27b2da 100644
--- a/neo/idlib/BitMsg.h
+++ b/neo/idlib/BitMsg.h
@@ -85,7 +85,7 @@ public:
 	void			WriteShort( int c );
 	void			WriteUShort( int c );
 	void			WriteInt( int c );
-	void			WriteFloat( float f );
+	void			WriteFloat( float f );	
 	void			WriteFloat( float f, int exponentBits, int mantissaBits );
 	void			WriteAngle8( float f );
 	void			WriteAngle16( float f );
diff --git a/neo/idlib/Heap.cpp b/neo/idlib/Heap.cpp
index a76fb4a..e69f008 100644
--- a/neo/idlib/Heap.cpp
+++ b/neo/idlib/Heap.cpp
@@ -1275,7 +1275,7 @@ const char *Mem_CleanupFileName( const char *fileName ) {
 		newFileName = newFileName.Left( i2 - 1 ) + newFileName.Right( newFileName.Length() - ( i1 + 4 ) );
 	}
 	index = ( index + 1 ) & 3;
-	strncpy( newFileNames[index], newFileName.c_str(), sizeof( newFileNames[index] ) );
+	idStr::Copynz( newFileNames[index], newFileName.c_str(), sizeof( newFileNames[index] ) );
 	return newFileNames[index];
 }
 
diff --git a/neo/idlib/Str.cpp b/neo/idlib/Str.cpp
index 4ccfb96..e730233 100644
--- a/neo/idlib/Str.cpp
+++ b/neo/idlib/Str.cpp
@@ -30,10 +30,15 @@ If you have questions concerning this license or the applicable additional terms
 #include "idlib/math/Vector.h"
 #include "idlib/Heap.h"
 #include "framework/Common.h"
+#include <limits.h>
 
 #include "idlib/Str.h"
 
-#if !defined( ID_REDIRECT_NEWDELETE ) && !defined( MACOS_X )
+// DG: idDynamicBlockAlloc isn't thread-safe and idStr is used both in the main thread
+//     and the async thread! For some reason this seems to cause lots of problems on
+//     newer Linux distros if dhewm3 is built with GCC9 or newer (see #391).
+//     No idea why it apparently didn't cause that (noticeable) issues before..
+#if 0 // !defined( ID_REDIRECT_NEWDELETE ) && !defined( MACOS_X )
 	#define USE_STRING_DATA_ALLOCATOR
 #endif
 
@@ -100,23 +105,30 @@ void idStr::ReAllocate( int amount, bool keepold ) {
 
 #ifdef USE_STRING_DATA_ALLOCATOR
 	newbuffer = stringDataAllocator.Alloc( alloced );
-#else
-	newbuffer = new char[ alloced ];
-#endif
 	if ( keepold && data ) {
 		data[ len ] = '\0';
 		strcpy( newbuffer, data );
 	}
 
 	if ( data && data != baseBuffer ) {
-#ifdef USE_STRING_DATA_ALLOCATOR
 		stringDataAllocator.Free( data );
-#else
-		delete [] data;
-#endif
 	}
 
 	data = newbuffer;
+#else
+	if ( data && data != baseBuffer ) {
+		data = (char *)realloc( data, newsize );
+	} else {
+		newbuffer = (char *)malloc( newsize );
+		if ( data && keepold ) {
+			memcpy( newbuffer, data, len );
+			newbuffer[ len ] = '\0';
+		} else {
+			newbuffer[ 0 ] = '\0';
+		}
+		data = newbuffer;
+	}
+#endif
 }
 
 /*
@@ -129,7 +141,7 @@ void idStr::FreeData( void ) {
 #ifdef USE_STRING_DATA_ALLOCATOR
 		stringDataAllocator.Free( data );
 #else
-		delete[] data;
+		free( data );
 #endif
 		data = baseBuffer;
 	}
@@ -1502,21 +1514,25 @@ idStr::snPrintf
 ================
 */
 int idStr::snPrintf( char *dest, int size, const char *fmt, ...) {
-	int len;
 	va_list argptr;
-	char buffer[32000];	// big, but small enough to fit in PPC stack
-
+	int len;
 	va_start( argptr, fmt );
-	len = vsprintf( buffer, fmt, argptr );
+	len = D3_vsnprintfC99(dest, size, fmt, argptr);
 	va_end( argptr );
-	if ( len >= sizeof( buffer ) ) {
+	if ( len >= 32000 ) {
+		// TODO: Previously this function used a 32000 byte buffer to write into
+		//       with vsprintf(), and raised this error if that was overflowed
+		//       (more likely that'd have lead to a crash..).
+		//       Technically we don't have that restriction anymore, so I'm unsure
+		//       if this error should really still be raised to preserve
+		//       the old intended behavior, maybe for compat with mod DLLs using
+		//       the old version of the function or something?
 		idLib::common->Error( "idStr::snPrintf: overflowed buffer" );
 	}
 	if ( len >= size ) {
 		idLib::common->Warning( "idStr::snPrintf: overflow of %i in %i\n", len, size );
 		len = size;
 	}
-	idStr::Copynz( dest, buffer, size );
 	return len;
 }
 
@@ -1539,18 +1555,7 @@ or returns -1 on failure or if the buffer would be overflowed.
 ============
 */
 int idStr::vsnPrintf( char *dest, int size, const char *fmt, va_list argptr ) {
-	int ret;
-
-#ifdef _WIN32
-#undef _vsnprintf
-	ret = _vsnprintf( dest, size-1, fmt, argptr );
-#define _vsnprintf	use_idStr_vsnPrintf
-#else
-#undef vsnprintf
-	ret = vsnprintf( dest, size, fmt, argptr );
-#define vsnprintf	use_idStr_vsnPrintf
-#endif
-	dest[size-1] = '\0';
+	int ret = D3_vsnprintfC99(dest, size, fmt, argptr);
 	if ( ret < 0 || ret >= size ) {
 		return -1;
 	}
@@ -1779,3 +1784,57 @@ idStr idStr::FormatNumber( int number ) {
 
 	return string;
 }
+
+// behaves like C99's vsnprintf() by returning the amount of bytes that
+// *would* have been written into a big enough buffer, even if that's > size
+// unlike idStr::vsnPrintf() which returns -1 in that case
+int D3_vsnprintfC99(char *dst, size_t size, const char *format, va_list ap)
+{
+	// before VS2015, it didn't have a standards-conforming (v)snprintf()-implementation
+	// same might be true for other windows compilers if they use old CRT versions, like MinGW does
+#if defined(_WIN32) && (!defined(_MSC_VER) || _MSC_VER < 1900)
+  #undef _vsnprintf
+	// based on DG_vsnprintf() from https://github.com/DanielGibson/Snippets/blob/master/DG_misc.h
+	int ret = -1;
+	if(dst != NULL && size > 0)
+	{
+#if defined(_MSC_VER) && _MSC_VER >= 1400
+		// I think MSVC2005 introduced _vsnprintf_s().
+		// this shuts up _vsnprintf() security/deprecation warnings.
+		ret = _vsnprintf_s(dst, size, _TRUNCATE, format, ap);
+#else
+		ret = _vsnprintf(dst, size, format, ap);
+		dst[size-1] = '\0'; // ensure '\0'-termination
+#endif
+	}
+
+	if(ret == -1)
+	{
+		// _vsnprintf() returns -1 if the output is truncated
+		// it's also -1 if dst or size were NULL/0, so the user didn't want to write
+		// we want to return the number of characters that would've been
+		// needed, though.. fortunately _vscprintf() calculates that.
+		ret = _vscprintf(format, ap);
+	}
+	return ret;
+  #define _vsnprintf	use_idStr_vsnPrintf
+#else // other operating systems and VisualC++ >= 2015 should have a proper vsnprintf()
+  #undef vsnprintf
+	return vsnprintf(dst, size, format, ap);
+  #define vsnprintf	use_idStr_vsnPrintf
+#endif
+}
+
+// behaves like C99's snprintf() by returning the amount of bytes that
+// *would* have been written into a big enough buffer, even if that's > size
+// unlike idStr::snPrintf() which returns the written bytes in that case
+// and also calls common->Warning() in case of overflows
+int D3_snprintfC99(char *dst, size_t size, const char *format, ...)
+{
+	int ret = 0;
+	va_list argptr;
+	va_start( argptr, format );
+	ret = D3_vsnprintfC99(dst, size, format, argptr);
+	va_end( argptr );
+	return ret;
+}
diff --git a/neo/idlib/Str.h b/neo/idlib/Str.h
index 5dfabe9..a44bab2 100644
--- a/neo/idlib/Str.h
+++ b/neo/idlib/Str.h
@@ -1068,4 +1068,15 @@ ID_INLINE int idStr::DynamicMemoryUsed() const {
 	return ( data == baseBuffer ) ? 0 : alloced;
 }
 
+// behaves like C99's snprintf() by returning the amount of bytes that
+// *would* have been written into a big enough buffer, even if that's > size
+// unlike idStr::snPrintf() which returns the written bytes in that case
+// and also calls common->Warning() in case of overflows
+int D3_snprintfC99(char *dst, size_t size, const char *format, ...) id_attribute((format(printf,3,4)));
+
+// behaves like C99's vsnprintf() by returning the amount of bytes that
+// *would* have been written into a big enough buffer, even if that's > size
+// unlike idStr::vsnPrintf() which returns -1 in that case
+int D3_vsnprintfC99(char *dst, size_t size, const char *format, va_list ap);
+
 #endif /* !__STR_H__ */
diff --git a/neo/idlib/geometry/Surface.h b/neo/idlib/geometry/Surface.h
index 983bce2..b88e3ab 100644
--- a/neo/idlib/geometry/Surface.h
+++ b/neo/idlib/geometry/Surface.h
@@ -123,7 +123,9 @@ idSurface::idSurface
 ID_INLINE idSurface::idSurface( const idDrawVert *verts, const int numVerts, const int *indexes, const int numIndexes ) {
 	assert( verts != NULL && indexes != NULL && numVerts > 0 && numIndexes > 0 );
 	this->verts.SetNum( numVerts );
-	memcpy( this->verts.Ptr(), verts, numVerts * sizeof( verts[0] ) );
+	for (int i = 0; i < numVerts; i++) {
+		this->verts[i] = verts[i];
+	}
 	this->indexes.SetNum( numIndexes );
 	memcpy( this->indexes.Ptr(), indexes, numIndexes * sizeof( indexes[0] ) );
 	GenerateEdgeIndexes();
diff --git a/neo/idlib/geometry/TraceModel.cpp b/neo/idlib/geometry/TraceModel.cpp
index b1a88ff..62dd273 100644
--- a/neo/idlib/geometry/TraceModel.cpp
+++ b/neo/idlib/geometry/TraceModel.cpp
@@ -1164,6 +1164,7 @@ int idTraceModel::GetOrderedSilhouetteEdges( const int edgeIsSilEdge[MAX_TRACEMO
 	int i, j, edgeNum, numSilEdges, nextSilVert;
 	int unsortedSilEdges[MAX_TRACEMODEL_EDGES];
 
+	unsortedSilEdges[0] = 0;
 	numSilEdges = 0;
 	for ( i = 1; i <= numEdges; i++ ) {
 		if ( edgeIsSilEdge[i] ) {
@@ -1409,7 +1410,10 @@ void idTraceModel::VolumeIntegrals( struct volumeIntegrals_s &integrals ) const
 	int i, a, b, c;
 	float nx, ny, nz;
 
-	memset( &integrals, 0, sizeof(volumeIntegrals_t) );
+	integrals.T0 = 0.0f;
+	integrals.T1.Zero();
+	integrals.T2.Zero();
+	integrals.TP.Zero();
 	for ( i = 0; i < numPolys; i++ ) {
 		poly = &polys[i];
 
diff --git a/neo/idlib/math/Matrix.h b/neo/idlib/math/Matrix.h
index 77b0adc..896c056 100644
--- a/neo/idlib/math/Matrix.h
+++ b/neo/idlib/math/Matrix.h
@@ -134,7 +134,8 @@ ID_INLINE idMat2::idMat2( const float xx, const float xy, const float yx, const
 }
 
 ID_INLINE idMat2::idMat2( const float src[ 2 ][ 2 ] ) {
-	memcpy( mat, src, 2 * 2 * sizeof( float ) );
+	mat[0].x = src[0][0]; mat[0].y = src[0][1];
+	mat[1].x = src[1][0]; mat[1].y = src[1][1];
 }
 
 ID_INLINE const idVec2 &idMat2::operator[]( int index ) const {
@@ -438,7 +439,9 @@ ID_INLINE idMat3::idMat3( const float xx, const float xy, const float xz, const
 }
 
 ID_INLINE idMat3::idMat3( const float src[ 3 ][ 3 ] ) {
-	memcpy( mat, src, 3 * 3 * sizeof( float ) );
+	mat[0].x = src[0][0]; mat[0].y = src[0][1]; mat[0].z = src[0][2];
+	mat[1].x = src[1][0]; mat[1].y = src[1][1]; mat[1].z = src[1][2];
+	mat[2].x = src[2][0]; mat[2].y = src[2][1]; mat[2].z = src[2][2];
 }
 
 ID_INLINE const idVec3 &idMat3::operator[]( int index ) const {
@@ -595,7 +598,9 @@ ID_INLINE bool idMat3::operator!=( const idMat3 &a ) const {
 }
 
 ID_INLINE void idMat3::Zero( void ) {
-	memset( mat, 0, sizeof( idMat3 ) );
+	mat[0].x = 0.0f; mat[0].y = 0.0f; mat[0].z = 0.0f;
+	mat[1].x = 0.0f; mat[1].y = 0.0f; mat[1].z = 0.0f;
+	mat[2].x = 0.0f; mat[2].y = 0.0f; mat[2].z = 0.0f;
 }
 
 ID_INLINE void idMat3::Identity( void ) {
@@ -881,7 +886,10 @@ ID_INLINE idMat4::idMat4( const idMat3 &rotation, const idVec3 &translation ) {
 }
 
 ID_INLINE idMat4::idMat4( const float src[ 4 ][ 4 ] ) {
-	memcpy( mat, src, 4 * 4 * sizeof( float ) );
+	mat[0].x = src[0][0]; mat[0].y = src[0][1]; mat[0].z = src[0][2]; mat[0].w = src[0][3];
+	mat[1].x = src[1][0]; mat[1].y = src[1][1]; mat[1].z = src[1][2]; mat[1].w = src[1][3];
+	mat[2].x = src[2][0]; mat[2].y = src[2][1]; mat[2].z = src[2][2]; mat[2].w = src[2][3];
+	mat[3].x = src[3][0]; mat[3].y = src[3][1]; mat[3].z = src[3][2]; mat[3].w = src[3][3];
 }
 
 ID_INLINE const idVec4 &idMat4::operator[]( int index ) const {
@@ -1057,7 +1065,10 @@ ID_INLINE bool idMat4::operator!=( const idMat4 &a ) const {
 }
 
 ID_INLINE void idMat4::Zero( void ) {
-	memset( mat, 0, sizeof( idMat4 ) );
+	mat[0].x = 0.0f; mat[0].y = 0.0f; mat[0].z = 0.0f; mat[0].w = 0.0f;
+	mat[1].x = 0.0f; mat[1].y = 0.0f; mat[1].z = 0.0f; mat[1].w = 0.0f;
+	mat[2].x = 0.0f; mat[2].y = 0.0f; mat[2].z = 0.0f; mat[2].w = 0.0f;
+	mat[3].x = 0.0f; mat[3].y = 0.0f; mat[3].z = 0.0f; mat[3].w = 0.0f;
 }
 
 ID_INLINE void idMat4::Identity( void ) {
@@ -1219,7 +1230,11 @@ ID_INLINE idMat5::idMat5( void ) {
 }
 
 ID_INLINE idMat5::idMat5( const float src[ 5 ][ 5 ] ) {
-	memcpy( mat, src, 5 * 5 * sizeof( float ) );
+	mat[0].x = src[0][0]; mat[0].y = src[0][1]; mat[0].z = src[0][2]; mat[0].s = src[0][3]; mat[0].t = src[0][4];
+	mat[1].x = src[1][0]; mat[1].y = src[1][1]; mat[1].z = src[1][2]; mat[1].s = src[1][3]; mat[1].t = src[1][4];
+	mat[2].x = src[2][0]; mat[2].y = src[2][1]; mat[2].z = src[2][2]; mat[2].s = src[2][3]; mat[2].t = src[2][4];
+	mat[3].x = src[3][0]; mat[3].y = src[3][1]; mat[3].z = src[3][2]; mat[3].s = src[3][3]; mat[3].t = src[3][4];
+	mat[4].x = src[4][0]; mat[4].y = src[4][1]; mat[4].z = src[4][2]; mat[4].s = src[4][3]; mat[4].t = src[4][4];
 }
 
 ID_INLINE idMat5::idMat5( const idVec5 &v0, const idVec5 &v1, const idVec5 &v2, const idVec5 &v3, const idVec5 &v4 ) {
@@ -1382,7 +1397,11 @@ ID_INLINE bool idMat5::operator!=( const idMat5 &a ) const {
 }
 
 ID_INLINE void idMat5::Zero( void ) {
-	memset( mat, 0, sizeof( idMat5 ) );
+	mat[0].x = 0.0f; mat[0].y = 0.0f; mat[0].z = 0.0f; mat[0].s = 0.0f; mat[0].t = 0.0f;
+	mat[1].x = 0.0f; mat[1].y = 0.0f; mat[1].z = 0.0f; mat[1].s = 0.0f; mat[1].t = 0.0f;
+	mat[2].x = 0.0f; mat[2].y = 0.0f; mat[2].z = 0.0f; mat[2].s = 0.0f; mat[2].t = 0.0f;
+	mat[3].x = 0.0f; mat[3].y = 0.0f; mat[3].z = 0.0f; mat[3].s = 0.0f; mat[3].t = 0.0f;
+	mat[4].x = 0.0f; mat[4].y = 0.0f; mat[4].z = 0.0f; mat[4].s = 0.0f; mat[4].t = 0.0f;
 }
 
 ID_INLINE void idMat5::Identity( void ) {
@@ -1536,7 +1555,12 @@ ID_INLINE idMat6::idMat6( const idVec6 &v0, const idVec6 &v1, const idVec6 &v2,
 }
 
 ID_INLINE idMat6::idMat6( const float src[ 6 ][ 6 ] ) {
-	memcpy( mat, src, 6 * 6 * sizeof( float ) );
+	memcpy( mat[0].ToFloatPtr(), src[0], 6 * sizeof( float ) );
+	memcpy( mat[1].ToFloatPtr(), src[1], 6 * sizeof( float ) );
+	memcpy( mat[2].ToFloatPtr(), src[2], 6 * sizeof( float ) );
+	memcpy( mat[3].ToFloatPtr(), src[3], 6 * sizeof( float ) );
+	memcpy( mat[4].ToFloatPtr(), src[4], 6 * sizeof( float ) );
+	memcpy( mat[5].ToFloatPtr(), src[5], 6 * sizeof( float ) );
 }
 
 ID_INLINE const idVec6 &idMat6::operator[]( int index ) const {
@@ -1699,7 +1723,9 @@ ID_INLINE bool idMat6::operator!=( const idMat6 &a ) const {
 }
 
 ID_INLINE void idMat6::Zero( void ) {
-	memset( mat, 0, sizeof( idMat6 ) );
+	for (int i = 0; i < 6; i++) {
+		mat[i].Zero();
+	}
 }
 
 ID_INLINE void idMat6::Identity( void ) {
diff --git a/neo/renderer/Cinematic.cpp b/neo/renderer/Cinematic.cpp
index 1b20794..e6bb4aa 100644
--- a/neo/renderer/Cinematic.cpp
+++ b/neo/renderer/Cinematic.cpp
@@ -34,8 +34,12 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "renderer/Cinematic.h"
 
+// DG: get rid of libjpeg; as far as I can tell no roqs that actually use it exist
+//#define ID_USE_LIBJPEG 1
+#ifdef ID_USE_LIBJPEG
 #include <jpeglib.h>
 #include <jerror.h>
+#endif
 
 #define CIN_system	1
 #define CIN_loop	2
@@ -939,8 +943,8 @@ unsigned short idCinematicLocal::yuv_to_rgb( int y, int u, int v ) {
 	g = (YY + ROQ_UG_tab[u] + ROQ_VG_tab[v]) >> 8;
 	b = (YY + ROQ_UB_tab[u]) >> 9;
 
-	if (r<0) r = 0; if (g<0) g = 0; if (b<0) b = 0;
-	if (r > 31) r = 31; if (g > 63) g = 63; if (b > 31) b = 31;
+	if (r <0 )  { r = 0;  } if (g < 0)  { g = 0;  } if (b < 0)  { b = 0;  }
+	if (r > 31) { r = 31; } if (g > 63) { g = 63; } if (b > 31) { b = 31; }
 
 	return (unsigned short)((r<<11)+(g<<5)+(b));
 }
@@ -957,8 +961,8 @@ unsigned int idCinematicLocal::yuv_to_rgb24( int y, int u, int v ) {
 	g = (YY + ROQ_UG_tab[u] + ROQ_VG_tab[v]) >> 6;
 	b = (YY + ROQ_UB_tab[u]) >> 6;
 
-	if (r<0) r = 0; if (g<0) g = 0; if (b<0) b = 0;
-	if (r > 255) r = 255; if (g > 255) g = 255; if (b > 255) b = 255;
+	if (r < 0)   { r = 0;   } if (g < 0)   { g = 0;   } if (b < 0)   { b = 0;   }
+	if (r > 255) { r = 255; } if (g > 255) { g = 255; } if (b > 255) { b = 255; }
 
 	return LittleInt((r)+(g<<8)+(b<<16));
 }
@@ -1284,11 +1288,19 @@ void idCinematicLocal::RoQReset() {
 	status = FMV_LOOPED;
 }
 
+#ifdef ID_USE_LIBJPEG
 /* jpeg error handling */
 struct jpeg_error_mgr jerr;
-
+#endif
 int JPEGBlit( byte *wStatus, byte *data, int datasize )
 {
+#ifndef ID_USE_LIBJPEG
+	// I don't think this code is actually used, because
+	// * the jpeg encoder parts in the roq encoder are disabled with #if 0
+	// * ffmpeg doesn't support ROQ_QUAD_JPEG and can decode all doom3 roqs anyway
+	common->Warning("Contrary to Daniel's assumption, JPEGBlit() is actually called! Please report that as a dhewm3 bug!\n");
+
+#else
   /* This struct contains the JPEG decompression parameters and pointers to
    * working space (which is allocated as needed by the JPEG library).
    */
@@ -1402,7 +1414,7 @@ int JPEGBlit( byte *wStatus, byte *data, int datasize )
   /* At this point you may want to check to see whether any corrupt-data
    * warnings occurred (test whether jerr.pub.num_warnings is nonzero).
    */
-
+#endif
   /* And we're done! */
   return 1;
 }
diff --git a/neo/renderer/GuiModel.cpp b/neo/renderer/GuiModel.cpp
index 08fdb2c..3af7f02 100644
--- a/neo/renderer/GuiModel.cpp
+++ b/neo/renderer/GuiModel.cpp
@@ -209,8 +209,14 @@ EmitToCurrentView
 void idGuiModel::EmitToCurrentView( float modelMatrix[16], bool depthHack ) {
 	float	modelViewMatrix[16];
 
-	myGlMultMatrix( modelMatrix, tr.viewDef->worldSpace.modelViewMatrix,
-			modelViewMatrix );
+	const float* worldMVM = tr.viewDef->worldSpace.modelViewMatrix;
+	// DG: for r_lockSurfaces use the real world modelViewMatrix
+	//     so GUIs don't float around
+	if(r_lockSurfaces.GetBool() && tr.viewDef == tr.primaryView) {
+		worldMVM = tr.lockSurfacesRealViewDef.worldSpace.modelViewMatrix;
+	}
+
+	myGlMultMatrix( modelMatrix, worldMVM, modelViewMatrix );
 
 	for ( int i = 0 ; i < surfaces.Num() ; i++ ) {
 		EmitSurface( &surfaces[i], modelMatrix, modelViewMatrix, depthHack );
diff --git a/neo/renderer/Image_files.cpp b/neo/renderer/Image_files.cpp
index 6d62e4e..8ea1dba 100644
--- a/neo/renderer/Image_files.cpp
+++ b/neo/renderer/Image_files.cpp
@@ -26,15 +26,21 @@ If you have questions concerning this license or the applicable additional terms
 ===========================================================================
 */
 
+// DG: replace libjpeg with stb_image.h because it causes fewer headaches
+// include this first, otherwise build breaks because of  use_idStr_* #defines in Str.h
+#define STB_IMAGE_IMPLEMENTATION
+#define STBI_NO_HDR
+#define STBI_NO_LINEAR
+#define STBI_ONLY_JPEG // at least for now, only use it for JPEG
+#define STBI_NO_STDIO  // images are passed as buffers
+#include "stb_image.h"
+
 #include "sys/platform.h"
 
 #include "renderer/tr_local.h"
 
 #include "renderer/Image.h"
 
-#include <jpeglib.h>
-#include <jerror.h>
-
 /*
 
 This file only has a single entry point:
@@ -757,50 +763,16 @@ LoadJPG
 =============
 */
 static void LoadJPG( const char *filename, unsigned char **pic, int *width, int *height, ID_TIME_T *timestamp ) {
-  /* This struct contains the JPEG decompression parameters and pointers to
-   * working space (which is allocated as needed by the JPEG library).
-   */
-  struct jpeg_decompress_struct cinfo;
-  /* We use our private extension JPEG error handler.
-   * Note that this struct must live as long as the main JPEG parameter
-   * struct, to avoid dangling-pointer problems.
-   */
-  /* This struct represents a JPEG error handler.  It is declared separately
-   * because applications often want to supply a specialized error handler
-   * (see the second half of this file for an example).  But here we just
-   * take the easy way out and use the standard error handler, which will
-   * print a message on stderr and call exit() if compression fails.
-   * Note that this struct must live as long as the main JPEG parameter
-   * struct, to avoid dangling-pointer problems.
-   */
-  struct jpeg_error_mgr jerr;
-  /* More stuff */
-  JSAMPARRAY buffer;		/* Output row buffer */
-  int row_stride;		/* physical row width in output buffer */
-  unsigned char *out;
-  byte	*fbuffer;
-  byte  *bbuf;
-
-  /* In this example we want to open the input file before doing anything else,
-   * so that the setjmp() error recovery below can assume the file is open.
-   * VERY IMPORTANT: use "b" option to fopen() if you are on a machine that
-   * requires it in order to read binary files.
-   */
-
-	// JDC: because fill_input_buffer() blindly copies INPUT_BUF_SIZE bytes,
-	// we need to make sure the file buffer is padded or it may crash
-  if ( pic ) {
-	*pic = NULL;		// until proven otherwise
-  }
-
-	int len;
-	idFile *f;
-
-	f = fileSystem->OpenFileRead( filename );
+
+	if ( pic ) {
+		*pic = NULL;		// until proven otherwise
+	}
+
+	idFile *f = fileSystem->OpenFileRead( filename );
 	if ( !f ) {
 		return;
 	}
-	len = f->Length();
+	int len = f->Length();
 	if ( timestamp ) {
 		*timestamp = f->Timestamp();
 	}
@@ -808,120 +780,31 @@ static void LoadJPG( const char *filename, unsigned char **pic, int *width, int
 		fileSystem->CloseFile( f );
 		return;	// just getting timestamp
 	}
-	fbuffer = (byte *)Mem_ClearedAlloc( len + 4096 );
+	byte *fbuffer = (byte *)Mem_ClearedAlloc( len );
 	f->Read( fbuffer, len );
 	fileSystem->CloseFile( f );
 
-  /* Step 1: allocate and initialize JPEG decompression object */
-
-  /* We have to set up the error handler first, in case the initialization
-   * step fails.  (Unlikely, but it could happen if you are out of memory.)
-   * This routine fills in the contents of struct jerr, and returns jerr's
-   * address which we place into the link field in cinfo.
-   */
-  cinfo.err = jpeg_std_error(&jerr);
-
-  /* Now we can initialize the JPEG decompression object. */
-  jpeg_create_decompress(&cinfo);
-
-  /* Step 2: specify data source (eg, a file) */
-
-  jpeg_mem_src(&cinfo, fbuffer, len);
-
-  /* Step 3: read file parameters with jpeg_read_header() */
-
-  (void) jpeg_read_header(&cinfo, (boolean)true);
-  /* We can ignore the return value from jpeg_read_header since
-   *   (a) suspension is not possible with the stdio data source, and
-   *   (b) we passed TRUE to reject a tables-only JPEG file as an error.
-   * See libjpeg.doc for more info.
-   */
-
-  /* Step 4: set parameters for decompression */
-
-  /* In this example, we don't need to change any of the defaults set by
-   * jpeg_read_header(), so we do nothing here.
-   */
-
-  /* Step 5: Start decompressor */
-
-  (void) jpeg_start_decompress(&cinfo);
-  /* We can ignore the return value since suspension is not possible
-   * with the stdio data source.
-   */
-
-  /* We may need to do some setup of our own at this point before reading
-   * the data.  After jpeg_start_decompress() we have the correct scaled
-   * output image dimensions available, as well as the output colormap
-   * if we asked for color quantization.
-   * In this example, we need to make an output work buffer of the right size.
-   */
-  /* JSAMPLEs per row in output buffer */
-  row_stride = cinfo.output_width * cinfo.output_components;
-
-  if (cinfo.output_components!=4) {
-		common->DWarning( "JPG %s is unsupported color depth (%d)",
-			filename, cinfo.output_components);
-  }
-  out = (byte *)R_StaticAlloc(cinfo.output_width*cinfo.output_height*4);
-
-  *pic = out;
-  *width = cinfo.output_width;
-  *height = cinfo.output_height;
-
-  /* Step 6: while (scan lines remain to be read) */
-  /*           jpeg_read_scanlines(...); */
-
-  /* Here we use the library's state variable cinfo.output_scanline as the
-   * loop counter, so that we don't have to keep track ourselves.
-   */
-  while (cinfo.output_scanline < cinfo.output_height) {
-	/* jpeg_read_scanlines expects an array of pointers to scanlines.
-	 * Here the array is only one element long, but you could ask for
-	 * more than one scanline at a time if that's more convenient.
-	 */
-	bbuf = ((out+(row_stride*cinfo.output_scanline)));
-	buffer = &bbuf;
-	(void) jpeg_read_scanlines(&cinfo, buffer, 1);
-  }
-
-  // clear all the alphas to 255
-  {
-	  int	i, j;
-		byte	*buf;
-
-		buf = *pic;
-
-	  j = cinfo.output_width * cinfo.output_height * 4;
-	  for ( i = 3 ; i < j ; i+=4 ) {
-		  buf[i] = 255;
-	  }
-  }
-
-  /* Step 7: Finish decompression */
-
-  (void) jpeg_finish_decompress(&cinfo);
-  /* We can ignore the return value since suspension is not possible
-   * with the stdio data source.
-   */
-
-  /* Step 8: Release JPEG decompression object */
-
-  /* This is an important step since it will release a good deal of memory. */
-  jpeg_destroy_decompress(&cinfo);
-
-  /* After finish_decompress, we can close the input file.
-   * Here we postpone it until after no more JPEG errors are possible,
-   * so as to simplify the setjmp error logic above.  (Actually, I don't
-   * think that jpeg_destroy can do an error exit, but why assume anything...)
-   */
-  Mem_Free( fbuffer );
-
-  /* At this point you may want to check to see whether any corrupt-data
-   * warnings occurred (test whether jerr.pub.num_warnings is nonzero).
-   */
-
-  /* And we're done! */
+	int w=0, h=0, comp=0;
+	byte* decodedImageData = stbi_load_from_memory( fbuffer, len, &w, &h, &comp, 4 );
+
+	Mem_Free( fbuffer );
+
+	if ( decodedImageData == NULL ) {
+		common->Warning( "stb_image was unable to load JPG %s : %s\n",
+					filename, stbi_failure_reason());
+		return;
+	}
+
+	// *pic must be allocated with R_StaticAlloc(), but stb_image allocates with malloc()
+	// (and as there is no R_StaticRealloc(), #define STBI_MALLOC etc won't help)
+	// so the decoded data must be copied once
+	int size = w*h*4;
+	*pic = (byte *)R_StaticAlloc( size );
+	memcpy( *pic, decodedImageData, size );
+	*width = w;
+	*height = h;
+	// now that decodedImageData has been copied into *pic, it's not needed anymore
+	stbi_image_free( decodedImageData );
 }
 
 //===================================================================
diff --git a/neo/renderer/Image_load.cpp b/neo/renderer/Image_load.cpp
index 505793d..50e2e20 100644
--- a/neo/renderer/Image_load.cpp
+++ b/neo/renderer/Image_load.cpp
@@ -1373,9 +1373,15 @@ bool idImage::CheckPrecompressedImage( bool fullLoad ) {
 		return false;
 	}
 
+#if 0 // DG: no idea what this was exactly meant to achieve, but it's definitely a bad idea:
+	//     we might try to load the lower mipmap levels of the image, but we'd still have
+	//     to load the whole .dds file first.
+	//     What's even weirder: idImage::ShouldImageBePartiallyCached() returns false
+	//     if the file size is LESS THAN image_cacheMinK * 1024...
 	if ( !fullLoad && len > globalImages->image_cacheMinK.GetInteger() * 1024 ) {
 		len = globalImages->image_cacheMinK.GetInteger() * 1024;
 	}
+#endif
 
 	byte *data = (byte *)R_StaticAlloc( len );
 
diff --git a/neo/renderer/Model.cpp b/neo/renderer/Model.cpp
index ad927d2..3016bf2 100644
--- a/neo/renderer/Model.cpp
+++ b/neo/renderer/Model.cpp
@@ -814,8 +814,8 @@ bool idRenderModelStatic::ConvertASEToModelSurfaces( const struct aseModel_s *as
 	for ( objectNum = 0 ; objectNum < ase->objects.Num() ; objectNum++ ) {
 		object = ase->objects[objectNum];
 		mesh = &object->mesh;
-		material = ase->materials[object->materialRef];
-		im1 = declManager->FindMaterial( material->name );
+		material = (ase->materials.Num() > object->materialRef) ? ase->materials[object->materialRef] : NULL;
+		im1 = declManager->FindMaterial( material ? material->name : NULL );
 
 		bool normalsParsed = mesh->normalsParsed;
 
diff --git a/neo/renderer/RenderSystem.cpp b/neo/renderer/RenderSystem.cpp
index 08414a3..722cf1a 100644
--- a/neo/renderer/RenderSystem.cpp
+++ b/neo/renderer/RenderSystem.cpp
@@ -221,11 +221,6 @@ void	R_AddDrawViewCmd( viewDef_t *parms ) {
 
 	cmd->viewDef = parms;
 
-	if ( parms->viewEntitys ) {
-		// save the command for r_lockSurfaces debugging
-		tr.lockSurfacesCmd = *cmd;
-	}
-
 	tr.pc.c_numViews++;
 
 	R_ViewStatistics( parms );
@@ -235,43 +230,6 @@ void	R_AddDrawViewCmd( viewDef_t *parms ) {
 //=================================================================================
 
 
-/*
-======================
-R_LockSurfaceScene
-
-r_lockSurfaces allows a developer to move around
-without changing the composition of the scene, including
-culling.  The only thing that is modified is the
-view position and axis, no front end work is done at all
-
-
-Add the stored off command again, so the new rendering will use EXACTLY
-the same surfaces, including all the culling, even though the transformation
-matricies have been changed.  This allow the culling tightness to be
-evaluated interactively.
-======================
-*/
-void R_LockSurfaceScene( viewDef_t *parms ) {
-	drawSurfsCommand_t	*cmd;
-	viewEntity_t			*vModel;
-
-	// set the matrix for world space to eye space
-	R_SetViewMatrix( parms );
-	tr.lockSurfacesCmd.viewDef->worldSpace = parms->worldSpace;
-
-	// update the view origin and axis, and all
-	// the entity matricies
-	for( vModel = tr.lockSurfacesCmd.viewDef->viewEntitys ; vModel ; vModel = vModel->next ) {
-		myGlMultMatrix( vModel->modelMatrix,
-			tr.lockSurfacesCmd.viewDef->worldSpace.modelViewMatrix,
-			vModel->modelViewMatrix );
-	}
-
-	// add the stored off surface commands again
-	cmd = (drawSurfsCommand_t *)R_GetCommandBuffer( sizeof( *cmd ) );
-	*cmd = tr.lockSurfacesCmd;
-}
-
 /*
 =============
 R_CheckCvars
@@ -288,6 +246,20 @@ static void R_CheckCvars( void ) {
 		r_brightness.ClearModified();
 		R_SetColorMappings();
 	}
+
+	if ( r_gammaInShader.IsModified() ) {
+		r_gammaInShader.ClearModified();
+		// reload shaders so they either add or remove the code for setting gamma/brightness in shader
+		R_ReloadARBPrograms_f( idCmdArgs() );
+
+		if ( r_gammaInShader.GetBool() ) {
+			common->Printf( "Will apply r_gamma and r_brightness in shaders\n" );
+			GLimp_ResetGamma(); // reset hardware gamma
+		} else {
+			common->Printf( "Will apply r_gamma and r_brightness in hardware (possibly on all screens)\n" );
+			R_SetColorMappings();
+		}
+	}
 }
 
 /*
@@ -599,6 +571,18 @@ void idRenderSystemLocal::BeginFrame( int windowWidth, int windowHeight ) {
 		return;
 	}
 
+	// DG: r_lockSurfaces only works properly if r_useScissor is disabled
+	if ( r_lockSurfaces.IsModified() ) {
+		static bool origUseScissor = true;
+		r_lockSurfaces.ClearModified();
+		if ( r_lockSurfaces.GetBool() ) {
+			origUseScissor = r_useScissor.GetBool();
+			r_useScissor.SetBool( false );
+		} else {
+			r_useScissor.SetBool( origUseScissor );
+		}
+	} // DG end
+
 	// determine which back end we will use
 	SetBackEndRenderer();
 
@@ -715,6 +699,7 @@ void idRenderSystemLocal::EndFrame( int *frontEndMsec, int *backEndMsec ) {
 
 	// use the other buffers next frame, because another CPU
 	// may still be rendering into the current buffers
+
 	R_ToggleSmpFrame();
 
 	// we can now release the vertexes used this frame
diff --git a/neo/renderer/RenderSystem.h b/neo/renderer/RenderSystem.h
index 253b9eb..b57aad9 100644
--- a/neo/renderer/RenderSystem.h
+++ b/neo/renderer/RenderSystem.h
@@ -59,7 +59,7 @@ typedef struct glconfig_s {
 	int					maxTextureImageUnits;
 	float				maxTextureAnisotropy;
 
-	int					colorBits, depthBits, stencilBits;
+	int					colorBits, alphabits, depthBits, stencilBits;
 
 	bool				multitextureAvailable;
 	bool				textureCompressionAvailable;
@@ -88,6 +88,10 @@ typedef struct glconfig_s {
 	bool				allowARB2Path;
 
 	bool				isInitialized;
+
+	// DG: current video backend is known to need opaque default framebuffer
+	//     used if r_fillWindowAlphaChan == -1
+	bool				shouldFillWindowAlpha;
 } glconfig_t;
 
 
diff --git a/neo/renderer/RenderSystem_init.cpp b/neo/renderer/RenderSystem_init.cpp
index f14bb05..fa120d6 100644
--- a/neo/renderer/RenderSystem_init.cpp
+++ b/neo/renderer/RenderSystem_init.cpp
@@ -91,6 +91,7 @@ idCVar r_swapInterval( "r_swapInterval", "1", CVAR_RENDERER | CVAR_ARCHIVE | CVA
 
 idCVar r_gamma( "r_gamma", "1", CVAR_RENDERER | CVAR_ARCHIVE | CVAR_FLOAT, "changes gamma tables", 0.5f, 3.0f );
 idCVar r_brightness( "r_brightness", "1", CVAR_RENDERER | CVAR_ARCHIVE | CVAR_FLOAT, "changes gamma tables", 0.5f, 2.0f );
+idCVar r_gammaInShader( "r_gammaInShader", "1", CVAR_RENDERER | CVAR_ARCHIVE | CVAR_BOOL, "Set gamma and brightness in shaders instead using hardware gamma" );
 
 idCVar r_renderer( "r_renderer", "best", CVAR_RENDERER | CVAR_ARCHIVE, "hardware specific renderer path to use", r_rendererArgs, idCmdSystem::ArgCompletion_String<r_rendererArgs> );
 
@@ -400,9 +401,13 @@ static void R_CheckPortableExtensions( void ) {
 	if( glConfig.glVersion >= 2.0) {
 		common->Printf( "... got GL2.0+ glStencilOpSeparate()\n" );
 		qglStencilOpSeparate = (PFNGLSTENCILOPSEPARATEPROC)GLimp_ExtensionPointer( "glStencilOpSeparate" );
+	} else if( R_CheckExtension( "GL_ATI_separate_stencil" ) ) {
+		common->Printf( "... got glStencilOpSeparateATI() (GL_ATI_separate_stencil)\n" );
+		// the ATI version of glStencilOpSeparate() has the same signature and should also
+		// behave identical to the GL2 version (in Mesa3D it's just an alias)
+		qglStencilOpSeparate = (PFNGLSTENCILOPSEPARATEPROC)GLimp_ExtensionPointer( "glStencilOpSeparateATI" );
 	} else {
-		// TODO: there was an extension by ATI providing glStencilOpSeparateATI - do we care?
-		common->Printf( "... don't have GL2.0+ glStencilOpSeparate()\n" );
+		common->Printf( "... don't have glStencilOpSeparateATI() or (GL2.0+) glStencilOpSeparate()\n" );
 		qglStencilOpSeparate = NULL;
 	}
 
@@ -733,7 +738,13 @@ void R_InitOpenGL( void ) {
 	R_InitFrameData();
 
 	// Reset our gamma
-	R_SetColorMappings();
+	r_gammaInShader.ClearModified();
+	if ( r_gammaInShader.GetBool() ) {
+		common->Printf( "Will apply r_gamma and r_brightness in shaders (r_gammaInShader 1)\n" );
+	} else {
+		common->Printf( "Will apply r_gamma and r_brightness in hardware (possibly on all screens; r_gammaInShader 0)\n" );
+		R_SetColorMappings();
+	}
 
 #ifdef _WIN32
 	static bool glCheck = false;
@@ -1757,6 +1768,12 @@ R_SetColorMappings
 ===============
 */
 void R_SetColorMappings( void ) {
+
+	if ( r_gammaInShader.GetBool() ) {
+		// nothing to do here
+		return;
+	}
+
 	int		i, j;
 	float	g, b;
 	int		inf;
@@ -2156,6 +2173,8 @@ idRenderSystemLocal::Shutdown
 void idRenderSystemLocal::Shutdown( void ) {
 	common->Printf( "idRenderSystem::Shutdown()\n" );
 
+	common->SetRefreshOnPrint( false ); // without a renderer there's nothing to refresh
+
 	R_DoneFreeType( );
 
 	if ( glConfig.isInitialized ) {
diff --git a/neo/renderer/RenderWorld.cpp b/neo/renderer/RenderWorld.cpp
index 227c96d..1fc6e3b 100644
--- a/neo/renderer/RenderWorld.cpp
+++ b/neo/renderer/RenderWorld.cpp
@@ -675,6 +675,8 @@ Rendering a scene may require multiple views to be rendered
 to handle mirrors,
 ====================
 */
+extern void R_SetupViewFrustum( viewDef_t* viewDef );
+extern void R_SetupProjection( viewDef_t * viewDef );
 void idRenderWorldLocal::RenderScene( const renderView_t *renderView ) {
 #ifndef	ID_DEDICATED
 	renderView_t	copy;
@@ -742,10 +744,32 @@ void idRenderWorldLocal::RenderScene( const renderView_t *renderView ) {
 	}
 
 	if ( r_lockSurfaces.GetBool() ) {
-		R_LockSurfaceScene( parms );
-		return;
+		tr.lockSurfacesRealViewDef = *parms;
+
+		// usually the following are called later in R_RenderView(), but we pass
+		// the locked viewDef to that function so do these calculations here
+		// (the results are needed for some special cases like in-world GUIs and mirrors)
+		R_SetViewMatrix( &tr.lockSurfacesRealViewDef );
+		R_SetupViewFrustum( &tr.lockSurfacesRealViewDef);
+		R_SetupProjection( &tr.lockSurfacesRealViewDef );
+
+		const viewDef_t* origParms = &tr.lockSurfacesRealViewDef;
+		*parms = tr.lockSurfacesViewDef;
+		parms->renderWorld = origParms->renderWorld;
+		parms->floatTime = origParms->floatTime;
+		parms->drawSurfs = origParms->drawSurfs; // should be NULL I think
+		parms->numDrawSurfs = origParms->numDrawSurfs;
+		parms->maxDrawSurfs = origParms->maxDrawSurfs;
+		parms->viewLights = origParms->viewLights;
+		parms->viewEntitys = origParms->viewEntitys;
+		parms->connectedAreas = origParms->connectedAreas;
+
+	} else {
+		// save current viewDef so it can be used if we enable r_lockSurfaces in the next frame
+		tr.lockSurfacesViewDef = *parms;
 	}
 
+
 	// save this world for use by some console commands
 	tr.primaryWorld = this;
 	tr.primaryRenderView = *renderView;
diff --git a/neo/renderer/RenderWorld_demo.cpp b/neo/renderer/RenderWorld_demo.cpp
index 32ad265..db94772 100644
--- a/neo/renderer/RenderWorld_demo.cpp
+++ b/neo/renderer/RenderWorld_demo.cpp
@@ -277,7 +277,10 @@ void	idRenderWorldLocal::WriteLoadMap() {
 	session->writeDemo->WriteInt( DC_LOADMAP );
 
 	demoHeader_t	header;
+	// DG: Note: here strncpy() makes sense, because all chars of mapname get written
+	//     so it's good if the ones behind the actual name are *all* \0
 	strncpy( header.mapname, mapName.c_str(), sizeof( header.mapname ) - 1 );
+	header.mapname[sizeof( header.mapname ) - 1] = '\0'; // make sure the last chars is also \0
 	header.version = 4;
 	header.sizeofRenderEntity = sizeof( renderEntity_t );
 	header.sizeofRenderLight = sizeof( renderLight_t );
diff --git a/neo/renderer/draw_arb2.cpp b/neo/renderer/draw_arb2.cpp
index 8aaa97d..3f7f908 100644
--- a/neo/renderer/draw_arb2.cpp
+++ b/neo/renderer/draw_arb2.cpp
@@ -98,6 +98,15 @@ void	RB_ARB2_DrawInteraction( const drawInteraction_t *din ) {
 	qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 0, din->diffuseColor.ToFloatPtr() );
 	qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 1, din->specularColor.ToFloatPtr() );
 
+	// DG: brightness and gamma in shader as program.env[4]
+	if ( r_gammaInShader.GetBool() ) {
+		// program.env[4].xyz are all r_brightness, program.env[4].w is 1.0/r_gamma
+		float parm[4];
+		parm[0] = parm[1] = parm[2] = r_brightness.GetFloat();
+		parm[3] = 1.0/r_gamma.GetFloat(); // 1.0/gamma so the shader doesn't have to do this calculation
+		qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 4, parm );
+	}
+
 	// set the textures
 
 	// texture 1 will be the per-surface bump map
@@ -344,6 +353,41 @@ static progDef_t	progs[MAX_GLPROGS] = {
 R_LoadARBProgram
 =================
 */
+
+static char* findLineThatStartsWith( char* text, const char* findMe ) {
+	char* res = strstr( text, findMe );
+	while ( res != NULL ) {
+		// skip whitespace before match, if any
+		char* cur = res;
+		if ( cur > text ) {
+			--cur;
+		}
+		while ( cur > text && ( *cur == ' ' || *cur == '\t' ) ) {
+			--cur;
+		}
+		// now we should be at a newline (or at the beginning)
+		if ( cur == text ) {
+			return cur;
+		}
+		if ( *cur == '\n' || *cur == '\r' ) {
+			return cur+1;
+		}
+		// otherwise maybe we're in commented out text or whatever, search on
+		res = strstr( res+1, findMe );
+	}
+	return NULL;
+}
+
+static ID_INLINE bool isARBidentifierChar( int c ) {
+	// according to chapter 3.11.2 in ARB_fragment_program.txt identifiers can only
+	// contain these chars (first char mustn't be a number, but that doesn't matter here)
+	// NOTE: isalnum() or isalpha() apparently doesn't work, as it also matches spaces (?!)
+	return  c == '$' || c == '_'
+	      || (c >= '0' && c <= '9')
+	      || (c >= 'A' && c <= 'Z')
+	      || (c >= 'a' && c <= 'z');
+}
+
 void R_LoadARBProgram( int progIndex ) {
 	int		ofs;
 	int		err;
@@ -409,6 +453,119 @@ void R_LoadARBProgram( int progIndex ) {
 	}
 	end[3] = 0;
 
+	// DG: hack gamma correction into shader
+	if ( r_gammaInShader.GetBool() && progs[progIndex].target == GL_FRAGMENT_PROGRAM_ARB ) {
+
+		// note that strlen("dhewm3tmpres") == strlen("result.color")
+		const char* tmpres = "TEMP dhewm3tmpres; # injected by dhewm3 for gamma correction\n";
+
+		// Note: program.env[4].xyz = r_brightness; program.env[4].w = 1.0/r_gamma
+		// outColor.rgb = pow(dhewm3tmpres.rgb*r_brightness, vec3(1.0/r_gamma))
+		// outColor.a = dhewm3tmpres.a;
+		const char* extraLines =
+			"# gamma correction in shader, injected by dhewm3 \n"
+			// MUL_SAT clamps the result to [0, 1] - it must not be negative because
+			// POW might not work with a negative base (it looks wrong with intel's Linux driver)
+			// and clamping values >1 to 1 is ok because when writing to result.color
+			// it's clamped anyway and pow(base, exp) is always >= 1 for base >= 1
+			"MUL_SAT dhewm3tmpres.xyz, program.env[4], dhewm3tmpres;\n" // first multiply with brightness
+			"POW result.color.x, dhewm3tmpres.x, program.env[4].w;\n" // then do pow(dhewm3tmpres.xyz, vec3(1/gamma))
+			"POW result.color.y, dhewm3tmpres.y, program.env[4].w;\n" // (apparently POW only supports scalars, not whole vectors)
+			"POW result.color.z, dhewm3tmpres.z, program.env[4].w;\n"
+			"MOV result.color.w, dhewm3tmpres.w;\n" // alpha remains unmodified
+			"\nEND\n\n"; // we add this block right at the end, replacing the original "END" string
+
+		int fullLen = strlen( start ) + strlen( tmpres ) + strlen( extraLines );
+		char* outStr = (char*)_alloca( fullLen + 1 );
+
+		// add tmpres right after OPTION line (if any)
+		char* insertPos = findLineThatStartsWith( start, "OPTION" );
+		if ( insertPos == NULL ) {
+			// no OPTION? then just put it after the first line (usually sth like "!!ARBfp1.0\n")
+			insertPos = start;
+		}
+		// but we want the position *after* that line
+		while( *insertPos != '\0' && *insertPos != '\n' && *insertPos != '\r' ) {
+			++insertPos;
+		}
+		// skip  the newline character(s) as well
+		while( *insertPos == '\n' || *insertPos == '\r' ) {
+			++insertPos;
+		}
+
+		// copy text up to insertPos
+		int curLen = insertPos-start;
+		memcpy( outStr, start, curLen );
+		// copy tmpres ("TEMP dhewm3tmpres; # ..")
+		memcpy( outStr+curLen, tmpres, strlen( tmpres ) );
+		curLen += strlen( tmpres );
+		// copy remaining original shader up to (excluding) "END"
+		int remLen = end - insertPos;
+		memcpy( outStr+curLen, insertPos, remLen );
+		curLen += remLen;
+
+		outStr[curLen] = '\0'; // make sure it's NULL-terminated so normal string functions work
+
+		// replace all existing occurrences of "result.color" with "dhewm3tmpres"
+		for( char* resCol = strstr( outStr, "result.color" );
+		     resCol != NULL; resCol = strstr( resCol+13, "result.color" ) ) {
+			memcpy( resCol, "dhewm3tmpres", 12 ); // both strings have the same length.
+
+			// if this was part of "OUTPUT bla = result.color;", replace
+			// "OUTPUT bla" with "ALIAS  bla" (so it becomes "ALIAS  bla = dhewm3tmpres;")
+			{
+				char* s = resCol - 1;
+				// first skip whitespace before "result.color"
+				while( s > outStr && (*s == ' ' || *s == '\t') ) {
+					--s;
+				}
+				// if there's no '=' before result.color, this line can't be affected
+				if ( *s != '=' || s <= outStr + 8 ) {
+					continue; // go on with next "result.color" in the for-loop
+				}
+				--s; // we were on '=', so go to the char before and it's time to skip whitespace again
+				while( s > outStr && ( *s == ' ' || *s == '\t' ) ) {
+					--s;
+				}
+				// now we should be at the end of "bla" (or however the variable/alias is called)
+				if ( s <= outStr+7 || !isARBidentifierChar( *s ) ) {
+					continue;
+				}
+				--s;
+				// skip all the remaining chars that are legal in identifiers
+				while( s > outStr && isARBidentifierChar( *s ) ) {
+					--s;
+				}
+				// there should be at least one space/tab between "OUTPUT" and "bla"
+				if ( s <= outStr + 6 || ( *s != ' ' && *s != '\t' ) ) {
+					continue;
+				}
+				--s;
+				// skip remaining whitespace (if any)
+				while( s > outStr && ( *s == ' ' || *s == '\t' ) ) {
+					--s;
+				}
+				// now we should be at "OUTPUT" (specifically at its last 'T'),
+				// if this is indeed such a case
+				if ( s <= outStr + 5 || *s != 'T' ) {
+					continue;
+				}
+				s -= 5; // skip to start of "OUTPUT", if this is indeed "OUTPUT"
+				if ( idStr::Cmpn( s, "OUTPUT", 6 ) == 0 ) {
+					// it really is "OUTPUT" => replace "OUTPUT" with "ALIAS "
+					memcpy(s, "ALIAS ", 6);
+				}
+			}
+		}
+
+		assert( curLen + strlen( extraLines ) <= fullLen );
+
+		// now add extraLines that calculate and set a gamma-corrected result.color
+		// strcat() should be safe because fullLen was calculated taking all parts into account
+		strcat( outStr, extraLines );
+		start = outStr;
+	}
+
 	qglBindProgramARB( progs[progIndex].target, progs[progIndex].ident );
 	qglGetError();
 
@@ -425,7 +582,8 @@ void R_LoadARBProgram( int progIndex ) {
 		} else if ( ofs >= (int)strlen( start ) ) {
 			common->Printf( "error at end of program\n" );
 		} else {
-			common->Printf( "error at %i:\n%s", ofs, start + ofs );
+			int printOfs = Max( ofs - 20, 0 ); // DG: print some more context
+			common->Printf( "error at %i:\n%s", ofs, start + printOfs );
 		}
 		return;
 	}
@@ -472,7 +630,7 @@ int R_FindARBProgram( GLenum target, const char *program ) {
 	// add it to the list and load it
 	progs[i].ident = (program_t)0;	// will be gen'd by R_LoadARBProgram
 	progs[i].target = target;
-	strncpy( progs[i].name, program, sizeof( progs[i].name ) - 1 );
+	idStr::Copynz( progs[i].name, program, sizeof( progs[i].name ) );
 
 	R_LoadARBProgram( i );
 
diff --git a/neo/renderer/draw_common.cpp b/neo/renderer/draw_common.cpp
index b7932b8..06d5037 100644
--- a/neo/renderer/draw_common.cpp
+++ b/neo/renderer/draw_common.cpp
@@ -577,7 +577,7 @@ RB_SetProgramEnvironment
 Sets variables that can be used by all vertex programs
 ==================
 */
-void RB_SetProgramEnvironment( void ) {
+void RB_SetProgramEnvironment( bool isPostProcess ) {
 	float	parm[4];
 	int		pot;
 
@@ -632,6 +632,20 @@ void RB_SetProgramEnvironment( void ) {
 	parm[3] = 1;
 	qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 1, parm );
 
+	// DG: brightness and gamma in shader as program.env[4]
+	if ( r_gammaInShader.GetBool() ) {
+		// program.env[4].xyz are all r_brightness, program.env[4].w is 1.0/r_gamma
+		if ( !isPostProcess ) {
+			parm[0] = parm[1] = parm[2] = r_brightness.GetFloat();
+			parm[3] = 1.0/r_gamma.GetFloat(); // 1.0/gamma so the shader doesn't have to do this calculation
+		} else {
+			// don't apply gamma/brightness in postprocess passes to avoid applying them twice
+			// (setting them to 1.0 makes them no-ops)
+			parm[0] = parm[1] = parm[2] = parm[3] = 1.0f;
+		}
+		qglProgramEnvParameter4fvARB( GL_FRAGMENT_PROGRAM_ARB, 4, parm );
+	}
+
 	//
 	// set eye position in global space
 	//
@@ -978,12 +992,15 @@ int RB_STD_DrawShaderPasses( drawSurf_t **drawSurfs, int numDrawSurfs ) {
 		return numDrawSurfs;
 	}
 
+	bool isPostProcess = false;
+
 	// if we are about to draw the first surface that needs
 	// the rendering in a texture, copy it over
 	if ( drawSurfs[0]->material->GetSort() >= SS_POST_PROCESS ) {
 		if ( r_skipPostProcess.GetBool() ) {
 			return 0;
 		}
+		isPostProcess = true;
 
 		// only dump if in a 3d view
 		if ( backEnd.viewDef->viewEntitys && tr.backEndRenderer == BE_ARB2 ) {
@@ -1000,7 +1017,7 @@ int RB_STD_DrawShaderPasses( drawSurf_t **drawSurfs, int numDrawSurfs ) {
 	GL_SelectTexture( 0 );
 	qglEnableClientState( GL_TEXTURE_COORD_ARRAY );
 
-	RB_SetProgramEnvironment();
+	RB_SetProgramEnvironment( isPostProcess );
 
 	// we don't use RB_RenderDrawSurfListWithFunction()
 	// because we want to defer the matrix load because many
diff --git a/neo/renderer/qgl.h b/neo/renderer/qgl.h
index 0cf2715..846a0d7 100644
--- a/neo/renderer/qgl.h
+++ b/neo/renderer/qgl.h
@@ -96,6 +96,10 @@ extern	void ( APIENTRY *qglColorTableEXT)( int, int, int, int, int, const void *
 extern	PFNGLACTIVESTENCILFACEEXTPROC	qglActiveStencilFaceEXT;
 
 // DG: couldn't find any extension for this, it's supported in GL2.0 and newer, incl OpenGL ES2.0
+// SE: work around missing function definition on legacy Mac OS X versions
+#if defined(OSX_TIGER) || defined(OSX_LEOPARD)
+typedef void (APIENTRYP PFNGLSTENCILOPSEPARATEPROC) (GLenum face, GLenum sfail, GLenum dpfail, GLenum dppass);
+#endif
 extern PFNGLSTENCILOPSEPARATEPROC qglStencilOpSeparate;
 
 // ARB_texture_compression
@@ -117,11 +121,8 @@ extern PFNGLDEPTHBOUNDSEXTPROC              qglDepthBoundsEXT;
 
 #if defined( _WIN32 ) && defined(ID_ALLOW_TOOLS)
 
-extern  int   (WINAPI * qwglChoosePixelFormat)(HDC, CONST PIXELFORMATDESCRIPTOR *);
-extern  int   (WINAPI * qwglDescribePixelFormat) (HDC, int, UINT, LPPIXELFORMATDESCRIPTOR);
-extern  int   (WINAPI * qwglGetPixelFormat)(HDC);
-extern  BOOL(WINAPI * qwglSetPixelFormat)(HDC, int, CONST PIXELFORMATDESCRIPTOR *);
 extern  BOOL(WINAPI * qwglSwapBuffers)(HDC);
+extern int Win_ChoosePixelFormat(HDC hdc);
 
 extern BOOL(WINAPI * qwglCopyContext)(HGLRC, HGLRC, UINT);
 extern HGLRC(WINAPI * qwglCreateContext)(HDC);
diff --git a/neo/renderer/qgl_proc.h b/neo/renderer/qgl_proc.h
index 54ef43d..af0146d 100644
--- a/neo/renderer/qgl_proc.h
+++ b/neo/renderer/qgl_proc.h
@@ -38,6 +38,7 @@ QGLPROC(glBegin, void, (GLenum mode))
 QGLPROC(glBindTexture, void, (GLenum target, GLuint texture))
 QGLPROC(glBitmap, void, (GLsizei width, GLsizei height, GLfloat xorig, GLfloat yorig, GLfloat xmove, GLfloat ymove, const GLubyte *bitmap))
 QGLPROC(glBlendFunc, void, (GLenum sfactor, GLenum dfactor))
+QGLPROC(glBlendEquation, void, (GLenum mode))
 QGLPROC(glCallList, void, (GLuint list))
 QGLPROC(glCallLists, void, (GLsizei n, GLenum type, const GLvoid *lists))
 QGLPROC(glClear, void, (GLbitfield mask))
diff --git a/neo/renderer/stb_image.h b/neo/renderer/stb_image.h
new file mode 100644
index 0000000..d60371b
--- /dev/null
+++ b/neo/renderer/stb_image.h
@@ -0,0 +1,7897 @@
+/* stb_image - v2.27 - public domain image loader - http://nothings.org/stb
+                                  no warranty implied; use at your own risk
+
+   Do this:
+      #define STB_IMAGE_IMPLEMENTATION
+   before you include this file in *one* C or C++ file to create the implementation.
+
+   // i.e. it should look like this:
+   #include ...
+   #include ...
+   #include ...
+   #define STB_IMAGE_IMPLEMENTATION
+   #include "stb_image.h"
+
+   You can #define STBI_ASSERT(x) before the #include to avoid using assert.h.
+   And #define STBI_MALLOC, STBI_REALLOC, and STBI_FREE to avoid using malloc,realloc,free
+
+
+   QUICK NOTES:
+      Primarily of interest to game developers and other people who can
+          avoid problematic images and only need the trivial interface
+
+      JPEG baseline & progressive (12 bpc/arithmetic not supported, same as stock IJG lib)
+      PNG 1/2/4/8/16-bit-per-channel
+
+      TGA (not sure what subset, if a subset)
+      BMP non-1bpp, non-RLE
+      PSD (composited view only, no extra channels, 8/16 bit-per-channel)
+
+      GIF (*comp always reports as 4-channel)
+      HDR (radiance rgbE format)
+      PIC (Softimage PIC)
+      PNM (PPM and PGM binary only)
+
+      Animated GIF still needs a proper API, but here's one way to do it:
+          http://gist.github.com/urraka/685d9a6340b26b830d49
+
+      - decode from memory or through FILE (define STBI_NO_STDIO to remove code)
+      - decode from arbitrary I/O callbacks
+      - SIMD acceleration on x86/x64 (SSE2) and ARM (NEON)
+
+   Full documentation under "DOCUMENTATION" below.
+
+
+LICENSE
+
+  See end of file for license information.
+
+RECENT REVISION HISTORY:
+
+      2.27  (2021-07-11) document stbi_info better, 16-bit PNM support, bug fixes
+      2.26  (2020-07-13) many minor fixes
+      2.25  (2020-02-02) fix warnings
+      2.24  (2020-02-02) fix warnings; thread-local failure_reason and flip_vertically
+      2.23  (2019-08-11) fix clang static analysis warning
+      2.22  (2019-03-04) gif fixes, fix warnings
+      2.21  (2019-02-25) fix typo in comment
+      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
+      2.19  (2018-02-11) fix warning
+      2.18  (2018-01-30) fix warnings
+      2.17  (2018-01-29) bugfix, 1-bit BMP, 16-bitness query, fix warnings
+      2.16  (2017-07-23) all functions have 16-bit variants; optimizations; bugfixes
+      2.15  (2017-03-18) fix png-1,2,4; all Imagenet JPGs; no runtime SSE detection on GCC
+      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
+      2.13  (2016-12-04) experimental 16-bit API, only for PNG so far; fixes
+      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
+      2.11  (2016-04-02) 16-bit PNGS; enable SSE2 in non-gcc x64
+                         RGB-format JPEG; remove white matting in PSD;
+                         allocate large structures on the stack;
+                         correct channel count for PNG & BMP
+      2.10  (2016-01-22) avoid warning introduced in 2.09
+      2.09  (2016-01-16) 16-bit TGA; comments in PNM files; STBI_REALLOC_SIZED
+
+   See end of file for full revision history.
+
+
+ ============================    Contributors    =========================
+
+ Image formats                          Extensions, features
+    Sean Barrett (jpeg, png, bmp)          Jetro Lauha (stbi_info)
+    Nicolas Schulz (hdr, psd)              Martin "SpartanJ" Golini (stbi_info)
+    Jonathan Dummer (tga)                  James "moose2000" Brown (iPhone PNG)
+    Jean-Marc Lienher (gif)                Ben "Disch" Wenger (io callbacks)
+    Tom Seddon (pic)                       Omar Cornut (1/2/4-bit PNG)
+    Thatcher Ulrich (psd)                  Nicolas Guillemot (vertical flip)
+    Ken Miller (pgm, ppm)                  Richard Mitton (16-bit PSD)
+    github:urraka (animated gif)           Junggon Kim (PNM comments)
+    Christopher Forseth (animated gif)     Daniel Gibson (16-bit TGA)
+                                           socks-the-fox (16-bit PNG)
+                                           Jeremy Sawicki (handle all ImageNet JPGs)
+ Optimizations & bugfixes                  Mikhail Morozov (1-bit BMP)
+    Fabian "ryg" Giesen                    Anael Seghezzi (is-16-bit query)
+    Arseny Kapoulkine                      Simon Breuss (16-bit PNM)
+    John-Mark Allen
+    Carmelo J Fdez-Aguera
+
+ Bug & warning fixes
+    Marc LeBlanc            David Woo          Guillaume George     Martins Mozeiko
+    Christpher Lloyd        Jerry Jansson      Joseph Thomson       Blazej Dariusz Roszkowski
+    Phil Jordan                                Dave Moore           Roy Eltham
+    Hayaki Saito            Nathan Reed        Won Chun
+    Luke Graham             Johan Duparc       Nick Verigakis       the Horde3D community
+    Thomas Ruf              Ronny Chevalier                         github:rlyeh
+    Janez Zemva             John Bartholomew   Michal Cichon        github:romigrou
+    Jonathan Blow           Ken Hamada         Tero Hanninen        github:svdijk
+    Eugene Golushkov        Laurent Gomila     Cort Stratton        github:snagar
+    Aruelien Pocheville     Sergio Gonzalez    Thibault Reuille     github:Zelex
+    Cass Everitt            Ryamond Barbiero                        github:grim210
+    Paul Du Bois            Engin Manap        Aldo Culquicondor    github:sammyhw
+    Philipp Wiesemann       Dale Weiler        Oriol Ferrer Mesia   github:phprus
+    Josh Tobin                                 Matthew Gregan       github:poppolopoppo
+    Julian Raschke          Gregory Mullen     Christian Floisand   github:darealshinji
+    Baldur Karlsson         Kevin Schmidt      JR Smith             github:Michaelangel007
+                            Brad Weinberger    Matvey Cherevko      github:mosra
+    Luca Sas                Alexander Veselov  Zack Middleton       [reserved]
+    Ryan C. Gordon          [reserved]                              [reserved]
+                     DO NOT ADD YOUR NAME HERE
+
+                     Jacko Dirks
+
+  To add your name to the credits, pick a random blank space in the middle and fill it.
+  80% of merge conflicts on stb PRs are due to people adding their name at the end
+  of the credits.
+*/
+
+#ifndef STBI_INCLUDE_STB_IMAGE_H
+#define STBI_INCLUDE_STB_IMAGE_H
+
+// DOCUMENTATION
+//
+// Limitations:
+//    - no 12-bit-per-channel JPEG
+//    - no JPEGs with arithmetic coding
+//    - GIF always returns *comp=4
+//
+// Basic usage (see HDR discussion below for HDR usage):
+//    int x,y,n;
+//    unsigned char *data = stbi_load(filename, &x, &y, &n, 0);
+//    // ... process data if not NULL ...
+//    // ... x = width, y = height, n = # 8-bit components per pixel ...
+//    // ... replace '0' with '1'..'4' to force that many components per pixel
+//    // ... but 'n' will always be the number that it would have been if you said 0
+//    stbi_image_free(data)
+//
+// Standard parameters:
+//    int *x                 -- outputs image width in pixels
+//    int *y                 -- outputs image height in pixels
+//    int *channels_in_file  -- outputs # of image components in image file
+//    int desired_channels   -- if non-zero, # of image components requested in result
+//
+// The return value from an image loader is an 'unsigned char *' which points
+// to the pixel data, or NULL on an allocation failure or if the image is
+// corrupt or invalid. The pixel data consists of *y scanlines of *x pixels,
+// with each pixel consisting of N interleaved 8-bit components; the first
+// pixel pointed to is top-left-most in the image. There is no padding between
+// image scanlines or between pixels, regardless of format. The number of
+// components N is 'desired_channels' if desired_channels is non-zero, or
+// *channels_in_file otherwise. If desired_channels is non-zero,
+// *channels_in_file has the number of components that _would_ have been
+// output otherwise. E.g. if you set desired_channels to 4, you will always
+// get RGBA output, but you can check *channels_in_file to see if it's trivially
+// opaque because e.g. there were only 3 channels in the source image.
+//
+// An output image with N components has the following components interleaved
+// in this order in each pixel:
+//
+//     N=#comp     components
+//       1           grey
+//       2           grey, alpha
+//       3           red, green, blue
+//       4           red, green, blue, alpha
+//
+// If image loading fails for any reason, the return value will be NULL,
+// and *x, *y, *channels_in_file will be unchanged. The function
+// stbi_failure_reason() can be queried for an extremely brief, end-user
+// unfriendly explanation of why the load failed. Define STBI_NO_FAILURE_STRINGS
+// to avoid compiling these strings at all, and STBI_FAILURE_USERMSG to get slightly
+// more user-friendly ones.
+//
+// Paletted PNG, BMP, GIF, and PIC images are automatically depalettized.
+//
+// To query the width, height and component count of an image without having to
+// decode the full file, you can use the stbi_info family of functions:
+//
+//   int x,y,n,ok;
+//   ok = stbi_info(filename, &x, &y, &n);
+//   // returns ok=1 and sets x, y, n if image is a supported format,
+//   // 0 otherwise.
+//
+// Note that stb_image pervasively uses ints in its public API for sizes,
+// including sizes of memory buffers. This is now part of the API and thus
+// hard to change without causing breakage. As a result, the various image
+// loaders all have certain limits on image size; these differ somewhat
+// by format but generally boil down to either just under 2GB or just under
+// 1GB. When the decoded image would be larger than this, stb_image decoding
+// will fail.
+//
+// Additionally, stb_image will reject image files that have any of their
+// dimensions set to a larger value than the configurable STBI_MAX_DIMENSIONS,
+// which defaults to 2**24 = 16777216 pixels. Due to the above memory limit,
+// the only way to have an image with such dimensions load correctly
+// is for it to have a rather extreme aspect ratio. Either way, the
+// assumption here is that such larger images are likely to be malformed
+// or malicious. If you do need to load an image with individual dimensions
+// larger than that, and it still fits in the overall size limit, you can
+// #define STBI_MAX_DIMENSIONS on your own to be something larger.
+//
+// ===========================================================================
+//
+// UNICODE:
+//
+//   If compiling for Windows and you wish to use Unicode filenames, compile
+//   with
+//       #define STBI_WINDOWS_UTF8
+//   and pass utf8-encoded filenames. Call stbi_convert_wchar_to_utf8 to convert
+//   Windows wchar_t filenames to utf8.
+//
+// ===========================================================================
+//
+// Philosophy
+//
+// stb libraries are designed with the following priorities:
+//
+//    1. easy to use
+//    2. easy to maintain
+//    3. good performance
+//
+// Sometimes I let "good performance" creep up in priority over "easy to maintain",
+// and for best performance I may provide less-easy-to-use APIs that give higher
+// performance, in addition to the easy-to-use ones. Nevertheless, it's important
+// to keep in mind that from the standpoint of you, a client of this library,
+// all you care about is #1 and #3, and stb libraries DO NOT emphasize #3 above all.
+//
+// Some secondary priorities arise directly from the first two, some of which
+// provide more explicit reasons why performance can't be emphasized.
+//
+//    - Portable ("ease of use")
+//    - Small source code footprint ("easy to maintain")
+//    - No dependencies ("ease of use")
+//
+// ===========================================================================
+//
+// I/O callbacks
+//
+// I/O callbacks allow you to read from arbitrary sources, like packaged
+// files or some other source. Data read from callbacks are processed
+// through a small internal buffer (currently 128 bytes) to try to reduce
+// overhead.
+//
+// The three functions you must define are "read" (reads some bytes of data),
+// "skip" (skips some bytes of data), "eof" (reports if the stream is at the end).
+//
+// ===========================================================================
+//
+// SIMD support
+//
+// The JPEG decoder will try to automatically use SIMD kernels on x86 when
+// supported by the compiler. For ARM Neon support, you must explicitly
+// request it.
+//
+// (The old do-it-yourself SIMD API is no longer supported in the current
+// code.)
+//
+// On x86, SSE2 will automatically be used when available based on a run-time
+// test; if not, the generic C versions are used as a fall-back. On ARM targets,
+// the typical path is to have separate builds for NEON and non-NEON devices
+// (at least this is true for iOS and Android). Therefore, the NEON support is
+// toggled by a build flag: define STBI_NEON to get NEON loops.
+//
+// If for some reason you do not want to use any of SIMD code, or if
+// you have issues compiling it, you can disable it entirely by
+// defining STBI_NO_SIMD.
+//
+// ===========================================================================
+//
+// HDR image support   (disable by defining STBI_NO_HDR)
+//
+// stb_image supports loading HDR images in general, and currently the Radiance
+// .HDR file format specifically. You can still load any file through the existing
+// interface; if you attempt to load an HDR file, it will be automatically remapped
+// to LDR, assuming gamma 2.2 and an arbitrary scale factor defaulting to 1;
+// both of these constants can be reconfigured through this interface:
+//
+//     stbi_hdr_to_ldr_gamma(2.2f);
+//     stbi_hdr_to_ldr_scale(1.0f);
+//
+// (note, do not use _inverse_ constants; stbi_image will invert them
+// appropriately).
+//
+// Additionally, there is a new, parallel interface for loading files as
+// (linear) floats to preserve the full dynamic range:
+//
+//    float *data = stbi_loadf(filename, &x, &y, &n, 0);
+//
+// If you load LDR images through this interface, those images will
+// be promoted to floating point values, run through the inverse of
+// constants corresponding to the above:
+//
+//     stbi_ldr_to_hdr_scale(1.0f);
+//     stbi_ldr_to_hdr_gamma(2.2f);
+//
+// Finally, given a filename (or an open file or memory block--see header
+// file for details) containing image data, you can query for the "most
+// appropriate" interface to use (that is, whether the image is HDR or
+// not), using:
+//
+//     stbi_is_hdr(char *filename);
+//
+// ===========================================================================
+//
+// iPhone PNG support:
+//
+// We optionally support converting iPhone-formatted PNGs (which store
+// premultiplied BGRA) back to RGB, even though they're internally encoded
+// differently. To enable this conversion, call
+// stbi_convert_iphone_png_to_rgb(1).
+//
+// Call stbi_set_unpremultiply_on_load(1) as well to force a divide per
+// pixel to remove any premultiplied alpha *only* if the image file explicitly
+// says there's premultiplied data (currently only happens in iPhone images,
+// and only if iPhone convert-to-rgb processing is on).
+//
+// ===========================================================================
+//
+// ADDITIONAL CONFIGURATION
+//
+//  - You can suppress implementation of any of the decoders to reduce
+//    your code footprint by #defining one or more of the following
+//    symbols before creating the implementation.
+//
+//        STBI_NO_JPEG
+//        STBI_NO_PNG
+//        STBI_NO_BMP
+//        STBI_NO_PSD
+//        STBI_NO_TGA
+//        STBI_NO_GIF
+//        STBI_NO_HDR
+//        STBI_NO_PIC
+//        STBI_NO_PNM   (.ppm and .pgm)
+//
+//  - You can request *only* certain decoders and suppress all other ones
+//    (this will be more forward-compatible, as addition of new decoders
+//    doesn't require you to disable them explicitly):
+//
+//        STBI_ONLY_JPEG
+//        STBI_ONLY_PNG
+//        STBI_ONLY_BMP
+//        STBI_ONLY_PSD
+//        STBI_ONLY_TGA
+//        STBI_ONLY_GIF
+//        STBI_ONLY_HDR
+//        STBI_ONLY_PIC
+//        STBI_ONLY_PNM   (.ppm and .pgm)
+//
+//   - If you use STBI_NO_PNG (or _ONLY_ without PNG), and you still
+//     want the zlib decoder to be available, #define STBI_SUPPORT_ZLIB
+//
+//  - If you define STBI_MAX_DIMENSIONS, stb_image will reject images greater
+//    than that size (in either width or height) without further processing.
+//    This is to let programs in the wild set an upper bound to prevent
+//    denial-of-service attacks on untrusted data, as one could generate a
+//    valid image of gigantic dimensions and force stb_image to allocate a
+//    huge block of memory and spend disproportionate time decoding it. By
+//    default this is set to (1 << 24), which is 16777216, but that's still
+//    very big.
+
+#ifndef STBI_NO_STDIO
+#include <stdio.h>
+#endif // STBI_NO_STDIO
+
+#define STBI_VERSION 1
+
+enum
+{
+   STBI_default = 0, // only used for desired_channels
+
+   STBI_grey       = 1,
+   STBI_grey_alpha = 2,
+   STBI_rgb        = 3,
+   STBI_rgb_alpha  = 4
+};
+
+#include <stdlib.h>
+typedef unsigned char stbi_uc;
+typedef unsigned short stbi_us;
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+#ifndef STBIDEF
+#ifdef STB_IMAGE_STATIC
+#define STBIDEF static
+#else
+#define STBIDEF extern
+#endif
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// PRIMARY API - works on images of any type
+//
+
+//
+// load image by filename, open file, or memory buffer
+//
+
+typedef struct
+{
+   int      (*read)  (void *user,char *data,int size);   // fill 'data' with 'size' bytes.  return number of bytes actually read
+   void     (*skip)  (void *user,int n);                 // skip the next 'n' bytes, or 'unget' the last -n bytes if negative
+   int      (*eof)   (void *user);                       // returns nonzero if we are at end of file/data
+} stbi_io_callbacks;
+
+////////////////////////////////////
+//
+// 8-bits-per-channel interface
+//
+
+STBIDEF stbi_uc *stbi_load_from_memory   (stbi_uc           const *buffer, int len   , int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk  , void *user, int *x, int *y, int *channels_in_file, int desired_channels);
+
+#ifndef STBI_NO_STDIO
+STBIDEF stbi_uc *stbi_load            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_uc *stbi_load_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
+// for stbi_load_from_file, file pointer is left pointing immediately after image
+#endif
+
+#ifndef STBI_NO_GIF
+STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
+#endif
+
+#ifdef STBI_WINDOWS_UTF8
+STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input);
+#endif
+
+////////////////////////////////////
+//
+// 16-bits-per-channel interface
+//
+
+STBIDEF stbi_us *stbi_load_16_from_memory   (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels);
+
+#ifndef STBI_NO_STDIO
+STBIDEF stbi_us *stbi_load_16          (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
+STBIDEF stbi_us *stbi_load_from_file_16(FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
+#endif
+
+////////////////////////////////////
+//
+// float-per-channel interface
+//
+#ifndef STBI_NO_LINEAR
+   STBIDEF float *stbi_loadf_from_memory     (stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels);
+   STBIDEF float *stbi_loadf_from_callbacks  (stbi_io_callbacks const *clbk, void *user, int *x, int *y,  int *channels_in_file, int desired_channels);
+
+   #ifndef STBI_NO_STDIO
+   STBIDEF float *stbi_loadf            (char const *filename, int *x, int *y, int *channels_in_file, int desired_channels);
+   STBIDEF float *stbi_loadf_from_file  (FILE *f, int *x, int *y, int *channels_in_file, int desired_channels);
+   #endif
+#endif
+
+#ifndef STBI_NO_HDR
+   STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma);
+   STBIDEF void   stbi_hdr_to_ldr_scale(float scale);
+#endif // STBI_NO_HDR
+
+#ifndef STBI_NO_LINEAR
+   STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma);
+   STBIDEF void   stbi_ldr_to_hdr_scale(float scale);
+#endif // STBI_NO_LINEAR
+
+// stbi_is_hdr is always defined, but always returns false if STBI_NO_HDR
+STBIDEF int    stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user);
+STBIDEF int    stbi_is_hdr_from_memory(stbi_uc const *buffer, int len);
+#ifndef STBI_NO_STDIO
+STBIDEF int      stbi_is_hdr          (char const *filename);
+STBIDEF int      stbi_is_hdr_from_file(FILE *f);
+#endif // STBI_NO_STDIO
+
+
+// get a VERY brief reason for failure
+// on most compilers (and ALL modern mainstream compilers) this is threadsafe
+STBIDEF const char *stbi_failure_reason  (void);
+
+// free the loaded image -- this is just free()
+STBIDEF void     stbi_image_free      (void *retval_from_stbi_load);
+
+// get image dimensions & components without fully decoding
+STBIDEF int      stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp);
+STBIDEF int      stbi_info_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp);
+STBIDEF int      stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len);
+STBIDEF int      stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *clbk, void *user);
+
+#ifndef STBI_NO_STDIO
+STBIDEF int      stbi_info               (char const *filename,     int *x, int *y, int *comp);
+STBIDEF int      stbi_info_from_file     (FILE *f,                  int *x, int *y, int *comp);
+STBIDEF int      stbi_is_16_bit          (char const *filename);
+STBIDEF int      stbi_is_16_bit_from_file(FILE *f);
+#endif
+
+
+
+// for image formats that explicitly notate that they have premultiplied alpha,
+// we just return the colors as stored in the file. set this flag to force
+// unpremultiplication. results are undefined if the unpremultiply overflow.
+STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply);
+
+// indicate whether we should process iphone images back to canonical format,
+// or just pass them through "as-is"
+STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert);
+
+// flip the image vertically, so the first pixel in the output array is the bottom left
+STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip);
+
+// as above, but only applies to images loaded on the thread that calls the function
+// this function is only available if your compiler supports thread-local variables;
+// calling it will fail to link if your compiler doesn't
+STBIDEF void stbi_set_unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply);
+STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert);
+STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip);
+
+// ZLIB client - used by PNG, available for other purposes
+
+STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen);
+STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header);
+STBIDEF char *stbi_zlib_decode_malloc(const char *buffer, int len, int *outlen);
+STBIDEF int   stbi_zlib_decode_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
+
+STBIDEF char *stbi_zlib_decode_noheader_malloc(const char *buffer, int len, int *outlen);
+STBIDEF int   stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+//
+//
+////   end header file   /////////////////////////////////////////////////////
+#endif // STBI_INCLUDE_STB_IMAGE_H
+
+#ifdef STB_IMAGE_IMPLEMENTATION
+
+#if defined(STBI_ONLY_JPEG) || defined(STBI_ONLY_PNG) || defined(STBI_ONLY_BMP) \
+  || defined(STBI_ONLY_TGA) || defined(STBI_ONLY_GIF) || defined(STBI_ONLY_PSD) \
+  || defined(STBI_ONLY_HDR) || defined(STBI_ONLY_PIC) || defined(STBI_ONLY_PNM) \
+  || defined(STBI_ONLY_ZLIB)
+   #ifndef STBI_ONLY_JPEG
+   #define STBI_NO_JPEG
+   #endif
+   #ifndef STBI_ONLY_PNG
+   #define STBI_NO_PNG
+   #endif
+   #ifndef STBI_ONLY_BMP
+   #define STBI_NO_BMP
+   #endif
+   #ifndef STBI_ONLY_PSD
+   #define STBI_NO_PSD
+   #endif
+   #ifndef STBI_ONLY_TGA
+   #define STBI_NO_TGA
+   #endif
+   #ifndef STBI_ONLY_GIF
+   #define STBI_NO_GIF
+   #endif
+   #ifndef STBI_ONLY_HDR
+   #define STBI_NO_HDR
+   #endif
+   #ifndef STBI_ONLY_PIC
+   #define STBI_NO_PIC
+   #endif
+   #ifndef STBI_ONLY_PNM
+   #define STBI_NO_PNM
+   #endif
+#endif
+
+#if defined(STBI_NO_PNG) && !defined(STBI_SUPPORT_ZLIB) && !defined(STBI_NO_ZLIB)
+#define STBI_NO_ZLIB
+#endif
+
+
+#include <stdarg.h>
+#include <stddef.h> // ptrdiff_t on osx
+#include <stdlib.h>
+#include <string.h>
+#include <limits.h>
+
+#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR)
+#include <math.h>  // ldexp, pow
+#endif
+
+#ifndef STBI_NO_STDIO
+#include <stdio.h>
+#endif
+
+#ifndef STBI_ASSERT
+#include <assert.h>
+#define STBI_ASSERT(x) assert(x)
+#endif
+
+#ifdef __cplusplus
+#define STBI_EXTERN extern "C"
+#else
+#define STBI_EXTERN extern
+#endif
+
+
+#ifndef _MSC_VER
+   #ifdef __cplusplus
+   #define stbi_inline inline
+   #else
+   #define stbi_inline
+   #endif
+#else
+   #define stbi_inline __forceinline
+#endif
+
+#ifndef STBI_NO_THREAD_LOCALS
+   #if defined(__cplusplus) &&  __cplusplus >= 201103L
+      #define STBI_THREAD_LOCAL       thread_local
+   #elif defined(__GNUC__) && __GNUC__ < 5
+      #define STBI_THREAD_LOCAL       __thread
+   #elif defined(_MSC_VER)
+      #define STBI_THREAD_LOCAL       __declspec(thread)
+   #elif defined (__STDC_VERSION__) && __STDC_VERSION__ >= 201112L && !defined(__STDC_NO_THREADS__)
+      #define STBI_THREAD_LOCAL       _Thread_local
+   #endif
+
+   #ifndef STBI_THREAD_LOCAL
+      #if defined(__GNUC__)
+        #define STBI_THREAD_LOCAL       __thread
+      #endif
+   #endif
+#endif
+
+#ifdef _MSC_VER
+typedef unsigned short stbi__uint16;
+typedef   signed short stbi__int16;
+typedef unsigned int   stbi__uint32;
+typedef   signed int   stbi__int32;
+#else
+#include <stdint.h>
+typedef uint16_t stbi__uint16;
+typedef int16_t  stbi__int16;
+typedef uint32_t stbi__uint32;
+typedef int32_t  stbi__int32;
+#endif
+
+// should produce compiler error if size is wrong
+typedef unsigned char validate_uint32[sizeof(stbi__uint32)==4 ? 1 : -1];
+
+#ifdef _MSC_VER
+#define STBI_NOTUSED(v)  (void)(v)
+#else
+#define STBI_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+#ifdef _MSC_VER
+#define STBI_HAS_LROTL
+#endif
+
+#ifdef STBI_HAS_LROTL
+   #define stbi_lrot(x,y)  _lrotl(x,y)
+#else
+   #define stbi_lrot(x,y)  (((x) << (y)) | ((x) >> (-(y) & 31)))
+#endif
+
+#if defined(STBI_MALLOC) && defined(STBI_FREE) && (defined(STBI_REALLOC) || defined(STBI_REALLOC_SIZED))
+// ok
+#elif !defined(STBI_MALLOC) && !defined(STBI_FREE) && !defined(STBI_REALLOC) && !defined(STBI_REALLOC_SIZED)
+// ok
+#else
+#error "Must define all or none of STBI_MALLOC, STBI_FREE, and STBI_REALLOC (or STBI_REALLOC_SIZED)."
+#endif
+
+#ifndef STBI_MALLOC
+#define STBI_MALLOC(sz)           malloc(sz)
+#define STBI_REALLOC(p,newsz)     realloc(p,newsz)
+#define STBI_FREE(p)              free(p)
+#endif
+
+#ifndef STBI_REALLOC_SIZED
+#define STBI_REALLOC_SIZED(p,oldsz,newsz) STBI_REALLOC(p,newsz)
+#endif
+
+// x86/x64 detection
+#if defined(__x86_64__) || defined(_M_X64)
+#define STBI__X64_TARGET
+#elif defined(__i386) || defined(_M_IX86)
+#define STBI__X86_TARGET
+#endif
+
+#if defined(__GNUC__) && defined(STBI__X86_TARGET) && !defined(__SSE2__) && !defined(STBI_NO_SIMD)
+// gcc doesn't support sse2 intrinsics unless you compile with -msse2,
+// which in turn means it gets to use SSE2 everywhere. This is unfortunate,
+// but previous attempts to provide the SSE2 functions with runtime
+// detection caused numerous issues. The way architecture extensions are
+// exposed in GCC/Clang is, sadly, not really suited for one-file libs.
+// New behavior: if compiled with -msse2, we use SSE2 without any
+// detection; if not, we don't use it at all.
+#define STBI_NO_SIMD
+#endif
+
+#if defined(__MINGW32__) && defined(STBI__X86_TARGET) && !defined(STBI_MINGW_ENABLE_SSE2) && !defined(STBI_NO_SIMD)
+// Note that __MINGW32__ doesn't actually mean 32-bit, so we have to avoid STBI__X64_TARGET
+//
+// 32-bit MinGW wants ESP to be 16-byte aligned, but this is not in the
+// Windows ABI and VC++ as well as Windows DLLs don't maintain that invariant.
+// As a result, enabling SSE2 on 32-bit MinGW is dangerous when not
+// simultaneously enabling "-mstackrealign".
+//
+// See https://github.com/nothings/stb/issues/81 for more information.
+//
+// So default to no SSE2 on 32-bit MinGW. If you've read this far and added
+// -mstackrealign to your build settings, feel free to #define STBI_MINGW_ENABLE_SSE2.
+#define STBI_NO_SIMD
+#endif
+
+#if !defined(STBI_NO_SIMD) && (defined(STBI__X86_TARGET) || defined(STBI__X64_TARGET))
+#define STBI_SSE2
+#include <emmintrin.h>
+
+#ifdef _MSC_VER
+
+#if _MSC_VER >= 1400  // not VC6
+#include <intrin.h> // __cpuid
+static int stbi__cpuid3(void)
+{
+   int info[4];
+   __cpuid(info,1);
+   return info[3];
+}
+#else
+static int stbi__cpuid3(void)
+{
+   int res;
+   __asm {
+      mov  eax,1
+      cpuid
+      mov  res,edx
+   }
+   return res;
+}
+#endif
+
+#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
+
+#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
+static int stbi__sse2_available(void)
+{
+   int info3 = stbi__cpuid3();
+   return ((info3 >> 26) & 1) != 0;
+}
+#endif
+
+#else // assume GCC-style if not VC++
+#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
+
+#if !defined(STBI_NO_JPEG) && defined(STBI_SSE2)
+static int stbi__sse2_available(void)
+{
+   // If we're even attempting to compile this on GCC/Clang, that means
+   // -msse2 is on, which means the compiler is allowed to use SSE2
+   // instructions at will, and so are we.
+   return 1;
+}
+#endif
+
+#endif
+#endif
+
+// ARM NEON
+#if defined(STBI_NO_SIMD) && defined(STBI_NEON)
+#undef STBI_NEON
+#endif
+
+#ifdef STBI_NEON
+#include <arm_neon.h>
+#ifdef _MSC_VER
+#define STBI_SIMD_ALIGN(type, name) __declspec(align(16)) type name
+#else
+#define STBI_SIMD_ALIGN(type, name) type name __attribute__((aligned(16)))
+#endif
+#endif
+
+#ifndef STBI_SIMD_ALIGN
+#define STBI_SIMD_ALIGN(type, name) type name
+#endif
+
+#ifndef STBI_MAX_DIMENSIONS
+#define STBI_MAX_DIMENSIONS (1 << 24)
+#endif
+
+///////////////////////////////////////////////
+//
+//  stbi__context struct and start_xxx functions
+
+// stbi__context structure is our basic context used by all images, so it
+// contains all the IO context, plus some basic image information
+typedef struct
+{
+   stbi__uint32 img_x, img_y;
+   int img_n, img_out_n;
+
+   stbi_io_callbacks io;
+   void *io_user_data;
+
+   int read_from_callbacks;
+   int buflen;
+   stbi_uc buffer_start[128];
+   int callback_already_read;
+
+   stbi_uc *img_buffer, *img_buffer_end;
+   stbi_uc *img_buffer_original, *img_buffer_original_end;
+} stbi__context;
+
+
+static void stbi__refill_buffer(stbi__context *s);
+
+// initialize a memory-decode context
+static void stbi__start_mem(stbi__context *s, stbi_uc const *buffer, int len)
+{
+   s->io.read = NULL;
+   s->read_from_callbacks = 0;
+   s->callback_already_read = 0;
+   s->img_buffer = s->img_buffer_original = (stbi_uc *) buffer;
+   s->img_buffer_end = s->img_buffer_original_end = (stbi_uc *) buffer+len;
+}
+
+// initialize a callback-based context
+static void stbi__start_callbacks(stbi__context *s, stbi_io_callbacks *c, void *user)
+{
+   s->io = *c;
+   s->io_user_data = user;
+   s->buflen = sizeof(s->buffer_start);
+   s->read_from_callbacks = 1;
+   s->callback_already_read = 0;
+   s->img_buffer = s->img_buffer_original = s->buffer_start;
+   stbi__refill_buffer(s);
+   s->img_buffer_original_end = s->img_buffer_end;
+}
+
+#ifndef STBI_NO_STDIO
+
+static int stbi__stdio_read(void *user, char *data, int size)
+{
+   return (int) fread(data,1,size,(FILE*) user);
+}
+
+static void stbi__stdio_skip(void *user, int n)
+{
+   int ch;
+   fseek((FILE*) user, n, SEEK_CUR);
+   ch = fgetc((FILE*) user);  /* have to read a byte to reset feof()'s flag */
+   if (ch != EOF) {
+      ungetc(ch, (FILE *) user);  /* push byte back onto stream if valid. */
+   }
+}
+
+static int stbi__stdio_eof(void *user)
+{
+   return feof((FILE*) user) || ferror((FILE *) user);
+}
+
+static stbi_io_callbacks stbi__stdio_callbacks =
+{
+   stbi__stdio_read,
+   stbi__stdio_skip,
+   stbi__stdio_eof,
+};
+
+static void stbi__start_file(stbi__context *s, FILE *f)
+{
+   stbi__start_callbacks(s, &stbi__stdio_callbacks, (void *) f);
+}
+
+//static void stop_file(stbi__context *s) { }
+
+#endif // !STBI_NO_STDIO
+
+static void stbi__rewind(stbi__context *s)
+{
+   // conceptually rewind SHOULD rewind to the beginning of the stream,
+   // but we just rewind to the beginning of the initial buffer, because
+   // we only use it after doing 'test', which only ever looks at at most 92 bytes
+   s->img_buffer = s->img_buffer_original;
+   s->img_buffer_end = s->img_buffer_original_end;
+}
+
+enum
+{
+   STBI_ORDER_RGB,
+   STBI_ORDER_BGR
+};
+
+typedef struct
+{
+   int bits_per_channel;
+   int num_channels;
+   int channel_order;
+} stbi__result_info;
+
+#ifndef STBI_NO_JPEG
+static int      stbi__jpeg_test(stbi__context *s);
+static void    *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PNG
+static int      stbi__png_test(stbi__context *s);
+static void    *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__png_info(stbi__context *s, int *x, int *y, int *comp);
+static int      stbi__png_is16(stbi__context *s);
+#endif
+
+#ifndef STBI_NO_BMP
+static int      stbi__bmp_test(stbi__context *s);
+static void    *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_TGA
+static int      stbi__tga_test(stbi__context *s);
+static void    *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__tga_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PSD
+static int      stbi__psd_test(stbi__context *s);
+static void    *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc);
+static int      stbi__psd_info(stbi__context *s, int *x, int *y, int *comp);
+static int      stbi__psd_is16(stbi__context *s);
+#endif
+
+#ifndef STBI_NO_HDR
+static int      stbi__hdr_test(stbi__context *s);
+static float   *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PIC
+static int      stbi__pic_test(stbi__context *s);
+static void    *stbi__pic_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__pic_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_GIF
+static int      stbi__gif_test(stbi__context *s);
+static void    *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static void    *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp);
+static int      stbi__gif_info(stbi__context *s, int *x, int *y, int *comp);
+#endif
+
+#ifndef STBI_NO_PNM
+static int      stbi__pnm_test(stbi__context *s);
+static void    *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri);
+static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp);
+static int      stbi__pnm_is16(stbi__context *s);
+#endif
+
+static
+#ifdef STBI_THREAD_LOCAL
+STBI_THREAD_LOCAL
+#endif
+const char *stbi__g_failure_reason;
+
+STBIDEF const char *stbi_failure_reason(void)
+{
+   return stbi__g_failure_reason;
+}
+
+#ifndef STBI_NO_FAILURE_STRINGS
+static int stbi__err(const char *str)
+{
+   stbi__g_failure_reason = str;
+   return 0;
+}
+#endif
+
+static void *stbi__malloc(size_t size)
+{
+    return STBI_MALLOC(size);
+}
+
+// stb_image uses ints pervasively, including for offset calculations.
+// therefore the largest decoded image size we can support with the
+// current code, even on 64-bit targets, is INT_MAX. this is not a
+// significant limitation for the intended use case.
+//
+// we do, however, need to make sure our size calculations don't
+// overflow. hence a few helper functions for size calculations that
+// multiply integers together, making sure that they're non-negative
+// and no overflow occurs.
+
+// return 1 if the sum is valid, 0 on overflow.
+// negative terms are considered invalid.
+static int stbi__addsizes_valid(int a, int b)
+{
+   if (b < 0) return 0;
+   // now 0 <= b <= INT_MAX, hence also
+   // 0 <= INT_MAX - b <= INTMAX.
+   // And "a + b <= INT_MAX" (which might overflow) is the
+   // same as a <= INT_MAX - b (no overflow)
+   return a <= INT_MAX - b;
+}
+
+// returns 1 if the product is valid, 0 on overflow.
+// negative factors are considered invalid.
+static int stbi__mul2sizes_valid(int a, int b)
+{
+   if (a < 0 || b < 0) return 0;
+   if (b == 0) return 1; // mul-by-0 is always safe
+   // portable way to check for no overflows in a*b
+   return a <= INT_MAX/b;
+}
+
+#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
+// returns 1 if "a*b + add" has no negative terms/factors and doesn't overflow
+static int stbi__mad2sizes_valid(int a, int b, int add)
+{
+   return stbi__mul2sizes_valid(a, b) && stbi__addsizes_valid(a*b, add);
+}
+#endif
+
+// returns 1 if "a*b*c + add" has no negative terms/factors and doesn't overflow
+static int stbi__mad3sizes_valid(int a, int b, int c, int add)
+{
+   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
+      stbi__addsizes_valid(a*b*c, add);
+}
+
+// returns 1 if "a*b*c*d + add" has no negative terms/factors and doesn't overflow
+#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
+static int stbi__mad4sizes_valid(int a, int b, int c, int d, int add)
+{
+   return stbi__mul2sizes_valid(a, b) && stbi__mul2sizes_valid(a*b, c) &&
+      stbi__mul2sizes_valid(a*b*c, d) && stbi__addsizes_valid(a*b*c*d, add);
+}
+#endif
+
+#if !defined(STBI_NO_JPEG) || !defined(STBI_NO_PNG) || !defined(STBI_NO_TGA) || !defined(STBI_NO_HDR)
+// mallocs with size overflow checking
+static void *stbi__malloc_mad2(int a, int b, int add)
+{
+   if (!stbi__mad2sizes_valid(a, b, add)) return NULL;
+   return stbi__malloc(a*b + add);
+}
+#endif
+
+static void *stbi__malloc_mad3(int a, int b, int c, int add)
+{
+   if (!stbi__mad3sizes_valid(a, b, c, add)) return NULL;
+   return stbi__malloc(a*b*c + add);
+}
+
+#if !defined(STBI_NO_LINEAR) || !defined(STBI_NO_HDR) || !defined(STBI_NO_PNM)
+static void *stbi__malloc_mad4(int a, int b, int c, int d, int add)
+{
+   if (!stbi__mad4sizes_valid(a, b, c, d, add)) return NULL;
+   return stbi__malloc(a*b*c*d + add);
+}
+#endif
+
+// stbi__err - error
+// stbi__errpf - error returning pointer to float
+// stbi__errpuc - error returning pointer to unsigned char
+
+#ifdef STBI_NO_FAILURE_STRINGS
+   #define stbi__err(x,y)  0
+#elif defined(STBI_FAILURE_USERMSG)
+   #define stbi__err(x,y)  stbi__err(y)
+#else
+   #define stbi__err(x,y)  stbi__err(x)
+#endif
+
+#define stbi__errpf(x,y)   ((float *)(size_t) (stbi__err(x,y)?NULL:NULL))
+#define stbi__errpuc(x,y)  ((unsigned char *)(size_t) (stbi__err(x,y)?NULL:NULL))
+
+STBIDEF void stbi_image_free(void *retval_from_stbi_load)
+{
+   STBI_FREE(retval_from_stbi_load);
+}
+
+#ifndef STBI_NO_LINEAR
+static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp);
+#endif
+
+#ifndef STBI_NO_HDR
+static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp);
+#endif
+
+static int stbi__vertically_flip_on_load_global = 0;
+
+STBIDEF void stbi_set_flip_vertically_on_load(int flag_true_if_should_flip)
+{
+   stbi__vertically_flip_on_load_global = flag_true_if_should_flip;
+}
+
+#ifndef STBI_THREAD_LOCAL
+#define stbi__vertically_flip_on_load  stbi__vertically_flip_on_load_global
+#else
+static STBI_THREAD_LOCAL int stbi__vertically_flip_on_load_local, stbi__vertically_flip_on_load_set;
+
+STBIDEF void stbi_set_flip_vertically_on_load_thread(int flag_true_if_should_flip)
+{
+   stbi__vertically_flip_on_load_local = flag_true_if_should_flip;
+   stbi__vertically_flip_on_load_set = 1;
+}
+
+#define stbi__vertically_flip_on_load  (stbi__vertically_flip_on_load_set       \
+                                         ? stbi__vertically_flip_on_load_local  \
+                                         : stbi__vertically_flip_on_load_global)
+#endif // STBI_THREAD_LOCAL
+
+static void *stbi__load_main(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
+{
+   memset(ri, 0, sizeof(*ri)); // make sure it's initialized if we add new fields
+   ri->bits_per_channel = 8; // default is 8 so most paths don't have to be changed
+   ri->channel_order = STBI_ORDER_RGB; // all current input & output are this, but this is here so we can add BGR order
+   ri->num_channels = 0;
+
+   // test the formats with a very explicit header first (at least a FOURCC
+   // or distinctive magic number first)
+   #ifndef STBI_NO_PNG
+   if (stbi__png_test(s))  return stbi__png_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_BMP
+   if (stbi__bmp_test(s))  return stbi__bmp_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_GIF
+   if (stbi__gif_test(s))  return stbi__gif_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_PSD
+   if (stbi__psd_test(s))  return stbi__psd_load(s,x,y,comp,req_comp, ri, bpc);
+   #else
+   STBI_NOTUSED(bpc);
+   #endif
+   #ifndef STBI_NO_PIC
+   if (stbi__pic_test(s))  return stbi__pic_load(s,x,y,comp,req_comp, ri);
+   #endif
+
+   // then the formats that can end up attempting to load with just 1 or 2
+   // bytes matching expectations; these are prone to false positives, so
+   // try them later
+   #ifndef STBI_NO_JPEG
+   if (stbi__jpeg_test(s)) return stbi__jpeg_load(s,x,y,comp,req_comp, ri);
+   #endif
+   #ifndef STBI_NO_PNM
+   if (stbi__pnm_test(s))  return stbi__pnm_load(s,x,y,comp,req_comp, ri);
+   #endif
+
+   #ifndef STBI_NO_HDR
+   if (stbi__hdr_test(s)) {
+      float *hdr = stbi__hdr_load(s, x,y,comp,req_comp, ri);
+      return stbi__hdr_to_ldr(hdr, *x, *y, req_comp ? req_comp : *comp);
+   }
+   #endif
+
+   #ifndef STBI_NO_TGA
+   // test tga last because it's a crappy test!
+   if (stbi__tga_test(s))
+      return stbi__tga_load(s,x,y,comp,req_comp, ri);
+   #endif
+
+   return stbi__errpuc("unknown image type", "Image not of any known type, or corrupt");
+}
+
+static stbi_uc *stbi__convert_16_to_8(stbi__uint16 *orig, int w, int h, int channels)
+{
+   int i;
+   int img_len = w * h * channels;
+   stbi_uc *reduced;
+
+   reduced = (stbi_uc *) stbi__malloc(img_len);
+   if (reduced == NULL) return stbi__errpuc("outofmem", "Out of memory");
+
+   for (i = 0; i < img_len; ++i)
+      reduced[i] = (stbi_uc)((orig[i] >> 8) & 0xFF); // top half of each byte is sufficient approx of 16->8 bit scaling
+
+   STBI_FREE(orig);
+   return reduced;
+}
+
+static stbi__uint16 *stbi__convert_8_to_16(stbi_uc *orig, int w, int h, int channels)
+{
+   int i;
+   int img_len = w * h * channels;
+   stbi__uint16 *enlarged;
+
+   enlarged = (stbi__uint16 *) stbi__malloc(img_len*2);
+   if (enlarged == NULL) return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
+
+   for (i = 0; i < img_len; ++i)
+      enlarged[i] = (stbi__uint16)((orig[i] << 8) + orig[i]); // replicate to high and low byte, maps 0->0, 255->0xffff
+
+   STBI_FREE(orig);
+   return enlarged;
+}
+
+static void stbi__vertical_flip(void *image, int w, int h, int bytes_per_pixel)
+{
+   int row;
+   size_t bytes_per_row = (size_t)w * bytes_per_pixel;
+   stbi_uc temp[2048];
+   stbi_uc *bytes = (stbi_uc *)image;
+
+   for (row = 0; row < (h>>1); row++) {
+      stbi_uc *row0 = bytes + row*bytes_per_row;
+      stbi_uc *row1 = bytes + (h - row - 1)*bytes_per_row;
+      // swap row0 with row1
+      size_t bytes_left = bytes_per_row;
+      while (bytes_left) {
+         size_t bytes_copy = (bytes_left < sizeof(temp)) ? bytes_left : sizeof(temp);
+         memcpy(temp, row0, bytes_copy);
+         memcpy(row0, row1, bytes_copy);
+         memcpy(row1, temp, bytes_copy);
+         row0 += bytes_copy;
+         row1 += bytes_copy;
+         bytes_left -= bytes_copy;
+      }
+   }
+}
+
+#ifndef STBI_NO_GIF
+static void stbi__vertical_flip_slices(void *image, int w, int h, int z, int bytes_per_pixel)
+{
+   int slice;
+   int slice_size = w * h * bytes_per_pixel;
+
+   stbi_uc *bytes = (stbi_uc *)image;
+   for (slice = 0; slice < z; ++slice) {
+      stbi__vertical_flip(bytes, w, h, bytes_per_pixel);
+      bytes += slice_size;
+   }
+}
+#endif
+
+static unsigned char *stbi__load_and_postprocess_8bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__result_info ri;
+   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 8);
+
+   if (result == NULL)
+      return NULL;
+
+   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
+   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
+
+   if (ri.bits_per_channel != 8) {
+      result = stbi__convert_16_to_8((stbi__uint16 *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
+      ri.bits_per_channel = 8;
+   }
+
+   // @TODO: move stbi__convert_format to here
+
+   if (stbi__vertically_flip_on_load) {
+      int channels = req_comp ? req_comp : *comp;
+      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi_uc));
+   }
+
+   return (unsigned char *) result;
+}
+
+static stbi__uint16 *stbi__load_and_postprocess_16bit(stbi__context *s, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__result_info ri;
+   void *result = stbi__load_main(s, x, y, comp, req_comp, &ri, 16);
+
+   if (result == NULL)
+      return NULL;
+
+   // it is the responsibility of the loaders to make sure we get either 8 or 16 bit.
+   STBI_ASSERT(ri.bits_per_channel == 8 || ri.bits_per_channel == 16);
+
+   if (ri.bits_per_channel != 16) {
+      result = stbi__convert_8_to_16((stbi_uc *) result, *x, *y, req_comp == 0 ? *comp : req_comp);
+      ri.bits_per_channel = 16;
+   }
+
+   // @TODO: move stbi__convert_format16 to here
+   // @TODO: special case RGB-to-Y (and RGBA-to-YA) for 8-bit-to-16-bit case to keep more precision
+
+   if (stbi__vertically_flip_on_load) {
+      int channels = req_comp ? req_comp : *comp;
+      stbi__vertical_flip(result, *x, *y, channels * sizeof(stbi__uint16));
+   }
+
+   return (stbi__uint16 *) result;
+}
+
+#if !defined(STBI_NO_HDR) && !defined(STBI_NO_LINEAR)
+static void stbi__float_postprocess(float *result, int *x, int *y, int *comp, int req_comp)
+{
+   if (stbi__vertically_flip_on_load && result != NULL) {
+      int channels = req_comp ? req_comp : *comp;
+      stbi__vertical_flip(result, *x, *y, channels * sizeof(float));
+   }
+}
+#endif
+
+#ifndef STBI_NO_STDIO
+
+#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
+STBI_EXTERN __declspec(dllimport) int __stdcall MultiByteToWideChar(unsigned int cp, unsigned long flags, const char *str, int cbmb, wchar_t *widestr, int cchwide);
+STBI_EXTERN __declspec(dllimport) int __stdcall WideCharToMultiByte(unsigned int cp, unsigned long flags, const wchar_t *widestr, int cchwide, char *str, int cbmb, const char *defchar, int *used_default);
+#endif
+
+#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
+STBIDEF int stbi_convert_wchar_to_utf8(char *buffer, size_t bufferlen, const wchar_t* input)
+{
+	return WideCharToMultiByte(65001 /* UTF8 */, 0, input, -1, buffer, (int) bufferlen, NULL, NULL);
+}
+#endif
+
+static FILE *stbi__fopen(char const *filename, char const *mode)
+{
+   FILE *f;
+#if defined(_WIN32) && defined(STBI_WINDOWS_UTF8)
+   wchar_t wMode[64];
+   wchar_t wFilename[1024];
+	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, filename, -1, wFilename, sizeof(wFilename)/sizeof(*wFilename)))
+      return 0;
+
+	if (0 == MultiByteToWideChar(65001 /* UTF8 */, 0, mode, -1, wMode, sizeof(wMode)/sizeof(*wMode)))
+      return 0;
+
+#if defined(_MSC_VER) && _MSC_VER >= 1400
+	if (0 != _wfopen_s(&f, wFilename, wMode))
+		f = 0;
+#else
+   f = _wfopen(wFilename, wMode);
+#endif
+
+#elif defined(_MSC_VER) && _MSC_VER >= 1400
+   if (0 != fopen_s(&f, filename, mode))
+      f=0;
+#else
+   f = fopen(filename, mode);
+#endif
+   return f;
+}
+
+
+STBIDEF stbi_uc *stbi_load(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   FILE *f = stbi__fopen(filename, "rb");
+   unsigned char *result;
+   if (!f) return stbi__errpuc("can't fopen", "Unable to open file");
+   result = stbi_load_from_file(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+STBIDEF stbi_uc *stbi_load_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   unsigned char *result;
+   stbi__context s;
+   stbi__start_file(&s,f);
+   result = stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
+   if (result) {
+      // need to 'unget' all the characters in the IO buffer
+      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
+   }
+   return result;
+}
+
+STBIDEF stbi__uint16 *stbi_load_from_file_16(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__uint16 *result;
+   stbi__context s;
+   stbi__start_file(&s,f);
+   result = stbi__load_and_postprocess_16bit(&s,x,y,comp,req_comp);
+   if (result) {
+      // need to 'unget' all the characters in the IO buffer
+      fseek(f, - (int) (s.img_buffer_end - s.img_buffer), SEEK_CUR);
+   }
+   return result;
+}
+
+STBIDEF stbi_us *stbi_load_16(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   FILE *f = stbi__fopen(filename, "rb");
+   stbi__uint16 *result;
+   if (!f) return (stbi_us *) stbi__errpuc("can't fopen", "Unable to open file");
+   result = stbi_load_from_file_16(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+
+#endif //!STBI_NO_STDIO
+
+STBIDEF stbi_us *stbi_load_16_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *channels_in_file, int desired_channels)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
+}
+
+STBIDEF stbi_us *stbi_load_16_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *channels_in_file, int desired_channels)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *)clbk, user);
+   return stbi__load_and_postprocess_16bit(&s,x,y,channels_in_file,desired_channels);
+}
+
+STBIDEF stbi_uc *stbi_load_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
+}
+
+STBIDEF stbi_uc *stbi_load_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi__load_and_postprocess_8bit(&s,x,y,comp,req_comp);
+}
+
+#ifndef STBI_NO_GIF
+STBIDEF stbi_uc *stbi_load_gif_from_memory(stbi_uc const *buffer, int len, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
+{
+   unsigned char *result;
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+
+   result = (unsigned char*) stbi__load_gif_main(&s, delays, x, y, z, comp, req_comp);
+   if (stbi__vertically_flip_on_load) {
+      stbi__vertical_flip_slices( result, *x, *y, *z, *comp );
+   }
+
+   return result;
+}
+#endif
+
+#ifndef STBI_NO_LINEAR
+static float *stbi__loadf_main(stbi__context *s, int *x, int *y, int *comp, int req_comp)
+{
+   unsigned char *data;
+   #ifndef STBI_NO_HDR
+   if (stbi__hdr_test(s)) {
+      stbi__result_info ri;
+      float *hdr_data = stbi__hdr_load(s,x,y,comp,req_comp, &ri);
+      if (hdr_data)
+         stbi__float_postprocess(hdr_data,x,y,comp,req_comp);
+      return hdr_data;
+   }
+   #endif
+   data = stbi__load_and_postprocess_8bit(s, x, y, comp, req_comp);
+   if (data)
+      return stbi__ldr_to_hdr(data, *x, *y, req_comp ? req_comp : *comp);
+   return stbi__errpf("unknown image type", "Image not of any known type, or corrupt");
+}
+
+STBIDEF float *stbi_loadf_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__loadf_main(&s,x,y,comp,req_comp);
+}
+
+STBIDEF float *stbi_loadf_from_callbacks(stbi_io_callbacks const *clbk, void *user, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi__loadf_main(&s,x,y,comp,req_comp);
+}
+
+#ifndef STBI_NO_STDIO
+STBIDEF float *stbi_loadf(char const *filename, int *x, int *y, int *comp, int req_comp)
+{
+   float *result;
+   FILE *f = stbi__fopen(filename, "rb");
+   if (!f) return stbi__errpf("can't fopen", "Unable to open file");
+   result = stbi_loadf_from_file(f,x,y,comp,req_comp);
+   fclose(f);
+   return result;
+}
+
+STBIDEF float *stbi_loadf_from_file(FILE *f, int *x, int *y, int *comp, int req_comp)
+{
+   stbi__context s;
+   stbi__start_file(&s,f);
+   return stbi__loadf_main(&s,x,y,comp,req_comp);
+}
+#endif // !STBI_NO_STDIO
+
+#endif // !STBI_NO_LINEAR
+
+// these is-hdr-or-not is defined independent of whether STBI_NO_LINEAR is
+// defined, for API simplicity; if STBI_NO_LINEAR is defined, it always
+// reports false!
+
+STBIDEF int stbi_is_hdr_from_memory(stbi_uc const *buffer, int len)
+{
+   #ifndef STBI_NO_HDR
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__hdr_test(&s);
+   #else
+   STBI_NOTUSED(buffer);
+   STBI_NOTUSED(len);
+   return 0;
+   #endif
+}
+
+#ifndef STBI_NO_STDIO
+STBIDEF int      stbi_is_hdr          (char const *filename)
+{
+   FILE *f = stbi__fopen(filename, "rb");
+   int result=0;
+   if (f) {
+      result = stbi_is_hdr_from_file(f);
+      fclose(f);
+   }
+   return result;
+}
+
+STBIDEF int stbi_is_hdr_from_file(FILE *f)
+{
+   #ifndef STBI_NO_HDR
+   long pos = ftell(f);
+   int res;
+   stbi__context s;
+   stbi__start_file(&s,f);
+   res = stbi__hdr_test(&s);
+   fseek(f, pos, SEEK_SET);
+   return res;
+   #else
+   STBI_NOTUSED(f);
+   return 0;
+   #endif
+}
+#endif // !STBI_NO_STDIO
+
+STBIDEF int      stbi_is_hdr_from_callbacks(stbi_io_callbacks const *clbk, void *user)
+{
+   #ifndef STBI_NO_HDR
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) clbk, user);
+   return stbi__hdr_test(&s);
+   #else
+   STBI_NOTUSED(clbk);
+   STBI_NOTUSED(user);
+   return 0;
+   #endif
+}
+
+#ifndef STBI_NO_LINEAR
+static float stbi__l2h_gamma=2.2f, stbi__l2h_scale=1.0f;
+
+STBIDEF void   stbi_ldr_to_hdr_gamma(float gamma) { stbi__l2h_gamma = gamma; }
+STBIDEF void   stbi_ldr_to_hdr_scale(float scale) { stbi__l2h_scale = scale; }
+#endif
+
+static float stbi__h2l_gamma_i=1.0f/2.2f, stbi__h2l_scale_i=1.0f;
+
+STBIDEF void   stbi_hdr_to_ldr_gamma(float gamma) { stbi__h2l_gamma_i = 1/gamma; }
+STBIDEF void   stbi_hdr_to_ldr_scale(float scale) { stbi__h2l_scale_i = 1/scale; }
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+// Common code used by all image loaders
+//
+
+enum
+{
+   STBI__SCAN_load=0,
+   STBI__SCAN_type,
+   STBI__SCAN_header
+};
+
+static void stbi__refill_buffer(stbi__context *s)
+{
+   int n = (s->io.read)(s->io_user_data,(char*)s->buffer_start,s->buflen);
+   s->callback_already_read += (int) (s->img_buffer - s->img_buffer_original);
+   if (n == 0) {
+      // at end of file, treat same as if from memory, but need to handle case
+      // where s->img_buffer isn't pointing to safe memory, e.g. 0-byte file
+      s->read_from_callbacks = 0;
+      s->img_buffer = s->buffer_start;
+      s->img_buffer_end = s->buffer_start+1;
+      *s->img_buffer = 0;
+   } else {
+      s->img_buffer = s->buffer_start;
+      s->img_buffer_end = s->buffer_start + n;
+   }
+}
+
+stbi_inline static stbi_uc stbi__get8(stbi__context *s)
+{
+   if (s->img_buffer < s->img_buffer_end)
+      return *s->img_buffer++;
+   if (s->read_from_callbacks) {
+      stbi__refill_buffer(s);
+      return *s->img_buffer++;
+   }
+   return 0;
+}
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_HDR) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
+// nothing
+#else
+stbi_inline static int stbi__at_eof(stbi__context *s)
+{
+   if (s->io.read) {
+      if (!(s->io.eof)(s->io_user_data)) return 0;
+      // if feof() is true, check if buffer = end
+      // special case: we've only got the special 0 character at the end
+      if (s->read_from_callbacks == 0) return 1;
+   }
+
+   return s->img_buffer >= s->img_buffer_end;
+}
+#endif
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC)
+// nothing
+#else
+static void stbi__skip(stbi__context *s, int n)
+{
+   if (n == 0) return;  // already there!
+   if (n < 0) {
+      s->img_buffer = s->img_buffer_end;
+      return;
+   }
+   if (s->io.read) {
+      int blen = (int) (s->img_buffer_end - s->img_buffer);
+      if (blen < n) {
+         s->img_buffer = s->img_buffer_end;
+         (s->io.skip)(s->io_user_data, n - blen);
+         return;
+      }
+   }
+   s->img_buffer += n;
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_TGA) && defined(STBI_NO_HDR) && defined(STBI_NO_PNM)
+// nothing
+#else
+static int stbi__getn(stbi__context *s, stbi_uc *buffer, int n)
+{
+   if (s->io.read) {
+      int blen = (int) (s->img_buffer_end - s->img_buffer);
+      if (blen < n) {
+         int res, count;
+
+         memcpy(buffer, s->img_buffer, blen);
+
+         count = (s->io.read)(s->io_user_data, (char*) buffer + blen, n - blen);
+         res = (count == (n-blen));
+         s->img_buffer = s->img_buffer_end;
+         return res;
+      }
+   }
+
+   if (s->img_buffer+n <= s->img_buffer_end) {
+      memcpy(buffer, s->img_buffer, n);
+      s->img_buffer += n;
+      return 1;
+   } else
+      return 0;
+}
+#endif
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
+// nothing
+#else
+static int stbi__get16be(stbi__context *s)
+{
+   int z = stbi__get8(s);
+   return (z << 8) + stbi__get8(s);
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD) && defined(STBI_NO_PIC)
+// nothing
+#else
+static stbi__uint32 stbi__get32be(stbi__context *s)
+{
+   stbi__uint32 z = stbi__get16be(s);
+   return (z << 16) + stbi__get16be(s);
+}
+#endif
+
+#if defined(STBI_NO_BMP) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF)
+// nothing
+#else
+static int stbi__get16le(stbi__context *s)
+{
+   int z = stbi__get8(s);
+   return z + (stbi__get8(s) << 8);
+}
+#endif
+
+#ifndef STBI_NO_BMP
+static stbi__uint32 stbi__get32le(stbi__context *s)
+{
+   stbi__uint32 z = stbi__get16le(s);
+   z += (stbi__uint32)stbi__get16le(s) << 16;
+   return z;
+}
+#endif
+
+#define STBI__BYTECAST(x)  ((stbi_uc) ((x) & 255))  // truncate int to byte without warnings
+
+#if defined(STBI_NO_JPEG) && defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
+// nothing
+#else
+//////////////////////////////////////////////////////////////////////////////
+//
+//  generic converter from built-in img_n to req_comp
+//    individual types do this automatically as much as possible (e.g. jpeg
+//    does all cases internally since it needs to colorspace convert anyway,
+//    and it never has alpha, so very few cases ). png can automatically
+//    interleave an alpha=255 channel, but falls back to this for other cases
+//
+//  assume data buffer is malloced, so malloc a new one and free that one
+//  only failure mode is malloc failing
+
+static stbi_uc stbi__compute_y(int r, int g, int b)
+{
+   return (stbi_uc) (((r*77) + (g*150) +  (29*b)) >> 8);
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_BMP) && defined(STBI_NO_PSD) && defined(STBI_NO_TGA) && defined(STBI_NO_GIF) && defined(STBI_NO_PIC) && defined(STBI_NO_PNM)
+// nothing
+#else
+static unsigned char *stbi__convert_format(unsigned char *data, int img_n, int req_comp, unsigned int x, unsigned int y)
+{
+   int i,j;
+   unsigned char *good;
+
+   if (req_comp == img_n) return data;
+   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
+
+   good = (unsigned char *) stbi__malloc_mad3(req_comp, x, y, 0);
+   if (good == NULL) {
+      STBI_FREE(data);
+      return stbi__errpuc("outofmem", "Out of memory");
+   }
+
+   for (j=0; j < (int) y; ++j) {
+      unsigned char *src  = data + j * x * img_n   ;
+      unsigned char *dest = good + j * x * req_comp;
+
+      #define STBI__COMBO(a,b)  ((a)*8+(b))
+      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
+      // convert source image with img_n components to one with req_comp components;
+      // avoid switch per pixel, so use switch per scanline and massive macros
+      switch (STBI__COMBO(img_n, req_comp)) {
+         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=255;                                     } break;
+         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
+         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=255;                     } break;
+         STBI__CASE(2,1) { dest[0]=src[0];                                                  } break;
+         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                  } break;
+         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                  } break;
+         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=255;        } break;
+         STBI__CASE(3,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(3,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = 255;    } break;
+         STBI__CASE(4,1) { dest[0]=stbi__compute_y(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(4,2) { dest[0]=stbi__compute_y(src[0],src[1],src[2]); dest[1] = src[3]; } break;
+         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                    } break;
+         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return stbi__errpuc("unsupported", "Unsupported format conversion");
+      }
+      #undef STBI__CASE
+   }
+
+   STBI_FREE(data);
+   return good;
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
+// nothing
+#else
+static stbi__uint16 stbi__compute_y_16(int r, int g, int b)
+{
+   return (stbi__uint16) (((r*77) + (g*150) +  (29*b)) >> 8);
+}
+#endif
+
+#if defined(STBI_NO_PNG) && defined(STBI_NO_PSD)
+// nothing
+#else
+static stbi__uint16 *stbi__convert_format16(stbi__uint16 *data, int img_n, int req_comp, unsigned int x, unsigned int y)
+{
+   int i,j;
+   stbi__uint16 *good;
+
+   if (req_comp == img_n) return data;
+   STBI_ASSERT(req_comp >= 1 && req_comp <= 4);
+
+   good = (stbi__uint16 *) stbi__malloc(req_comp * x * y * 2);
+   if (good == NULL) {
+      STBI_FREE(data);
+      return (stbi__uint16 *) stbi__errpuc("outofmem", "Out of memory");
+   }
+
+   for (j=0; j < (int) y; ++j) {
+      stbi__uint16 *src  = data + j * x * img_n   ;
+      stbi__uint16 *dest = good + j * x * req_comp;
+
+      #define STBI__COMBO(a,b)  ((a)*8+(b))
+      #define STBI__CASE(a,b)   case STBI__COMBO(a,b): for(i=x-1; i >= 0; --i, src += a, dest += b)
+      // convert source image with img_n components to one with req_comp components;
+      // avoid switch per pixel, so use switch per scanline and massive macros
+      switch (STBI__COMBO(img_n, req_comp)) {
+         STBI__CASE(1,2) { dest[0]=src[0]; dest[1]=0xffff;                                     } break;
+         STBI__CASE(1,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
+         STBI__CASE(1,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=0xffff;                     } break;
+         STBI__CASE(2,1) { dest[0]=src[0];                                                     } break;
+         STBI__CASE(2,3) { dest[0]=dest[1]=dest[2]=src[0];                                     } break;
+         STBI__CASE(2,4) { dest[0]=dest[1]=dest[2]=src[0]; dest[3]=src[1];                     } break;
+         STBI__CASE(3,4) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];dest[3]=0xffff;        } break;
+         STBI__CASE(3,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(3,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = 0xffff; } break;
+         STBI__CASE(4,1) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]);                   } break;
+         STBI__CASE(4,2) { dest[0]=stbi__compute_y_16(src[0],src[1],src[2]); dest[1] = src[3]; } break;
+         STBI__CASE(4,3) { dest[0]=src[0];dest[1]=src[1];dest[2]=src[2];                       } break;
+         default: STBI_ASSERT(0); STBI_FREE(data); STBI_FREE(good); return (stbi__uint16*) stbi__errpuc("unsupported", "Unsupported format conversion");
+      }
+      #undef STBI__CASE
+   }
+
+   STBI_FREE(data);
+   return good;
+}
+#endif
+
+#ifndef STBI_NO_LINEAR
+static float   *stbi__ldr_to_hdr(stbi_uc *data, int x, int y, int comp)
+{
+   int i,k,n;
+   float *output;
+   if (!data) return NULL;
+   output = (float *) stbi__malloc_mad4(x, y, comp, sizeof(float), 0);
+   if (output == NULL) { STBI_FREE(data); return stbi__errpf("outofmem", "Out of memory"); }
+   // compute number of non-alpha components
+   if (comp & 1) n = comp; else n = comp-1;
+   for (i=0; i < x*y; ++i) {
+      for (k=0; k < n; ++k) {
+         output[i*comp + k] = (float) (pow(data[i*comp+k]/255.0f, stbi__l2h_gamma) * stbi__l2h_scale);
+      }
+   }
+   if (n < comp) {
+      for (i=0; i < x*y; ++i) {
+         output[i*comp + n] = data[i*comp + n]/255.0f;
+      }
+   }
+   STBI_FREE(data);
+   return output;
+}
+#endif
+
+#ifndef STBI_NO_HDR
+#define stbi__float2int(x)   ((int) (x))
+static stbi_uc *stbi__hdr_to_ldr(float   *data, int x, int y, int comp)
+{
+   int i,k,n;
+   stbi_uc *output;
+   if (!data) return NULL;
+   output = (stbi_uc *) stbi__malloc_mad3(x, y, comp, 0);
+   if (output == NULL) { STBI_FREE(data); return stbi__errpuc("outofmem", "Out of memory"); }
+   // compute number of non-alpha components
+   if (comp & 1) n = comp; else n = comp-1;
+   for (i=0; i < x*y; ++i) {
+      for (k=0; k < n; ++k) {
+         float z = (float) pow(data[i*comp+k]*stbi__h2l_scale_i, stbi__h2l_gamma_i) * 255 + 0.5f;
+         if (z < 0) z = 0;
+         if (z > 255) z = 255;
+         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
+      }
+      if (k < comp) {
+         float z = data[i*comp+k] * 255 + 0.5f;
+         if (z < 0) z = 0;
+         if (z > 255) z = 255;
+         output[i*comp + k] = (stbi_uc) stbi__float2int(z);
+      }
+   }
+   STBI_FREE(data);
+   return output;
+}
+#endif
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  "baseline" JPEG/JFIF decoder
+//
+//    simple implementation
+//      - doesn't support delayed output of y-dimension
+//      - simple interface (only one output format: 8-bit interleaved RGB)
+//      - doesn't try to recover corrupt jpegs
+//      - doesn't allow partial loading, loading multiple at once
+//      - still fast on x86 (copying globals into locals doesn't help x86)
+//      - allocates lots of intermediate memory (full size of all components)
+//        - non-interleaved case requires this anyway
+//        - allows good upsampling (see next)
+//    high-quality
+//      - upsampled channels are bilinearly interpolated, even across blocks
+//      - quality integer IDCT derived from IJG's 'slow'
+//    performance
+//      - fast huffman; reasonable integer IDCT
+//      - some SIMD kernels for common paths on targets with SSE2/NEON
+//      - uses a lot of intermediate memory, could cache poorly
+
+#ifndef STBI_NO_JPEG
+
+// huffman decoding acceleration
+#define FAST_BITS   9  // larger handles more cases; smaller stomps less cache
+
+typedef struct
+{
+   stbi_uc  fast[1 << FAST_BITS];
+   // weirdly, repacking this into AoS is a 10% speed loss, instead of a win
+   stbi__uint16 code[256];
+   stbi_uc  values[256];
+   stbi_uc  size[257];
+   unsigned int maxcode[18];
+   int    delta[17];   // old 'firstsymbol' - old 'firstcode'
+} stbi__huffman;
+
+typedef struct
+{
+   stbi__context *s;
+   stbi__huffman huff_dc[4];
+   stbi__huffman huff_ac[4];
+   stbi__uint16 dequant[4][64];
+   stbi__int16 fast_ac[4][1 << FAST_BITS];
+
+// sizes for components, interleaved MCUs
+   int img_h_max, img_v_max;
+   int img_mcu_x, img_mcu_y;
+   int img_mcu_w, img_mcu_h;
+
+// definition of jpeg image component
+   struct
+   {
+      int id;
+      int h,v;
+      int tq;
+      int hd,ha;
+      int dc_pred;
+
+      int x,y,w2,h2;
+      stbi_uc *data;
+      void *raw_data, *raw_coeff;
+      stbi_uc *linebuf;
+      short   *coeff;   // progressive only
+      int      coeff_w, coeff_h; // number of 8x8 coefficient blocks
+   } img_comp[4];
+
+   stbi__uint32   code_buffer; // jpeg entropy-coded buffer
+   int            code_bits;   // number of valid bits
+   unsigned char  marker;      // marker seen while filling entropy buffer
+   int            nomore;      // flag if we saw a marker so must stop
+
+   int            progressive;
+   int            spec_start;
+   int            spec_end;
+   int            succ_high;
+   int            succ_low;
+   int            eob_run;
+   int            jfif;
+   int            app14_color_transform; // Adobe APP14 tag
+   int            rgb;
+
+   int scan_n, order[4];
+   int restart_interval, todo;
+
+// kernels
+   void (*idct_block_kernel)(stbi_uc *out, int out_stride, short data[64]);
+   void (*YCbCr_to_RGB_kernel)(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step);
+   stbi_uc *(*resample_row_hv_2_kernel)(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs);
+} stbi__jpeg;
+
+static int stbi__build_huffman(stbi__huffman *h, int *count)
+{
+   int i,j,k=0;
+   unsigned int code;
+   // build size list for each symbol (from JPEG spec)
+   for (i=0; i < 16; ++i)
+      for (j=0; j < count[i]; ++j)
+         h->size[k++] = (stbi_uc) (i+1);
+   h->size[k] = 0;
+
+   // compute actual symbols (from jpeg spec)
+   code = 0;
+   k = 0;
+   for(j=1; j <= 16; ++j) {
+      // compute delta to add to code to compute symbol id
+      h->delta[j] = k - code;
+      if (h->size[k] == j) {
+         while (h->size[k] == j)
+            h->code[k++] = (stbi__uint16) (code++);
+         if (code-1 >= (1u << j)) return stbi__err("bad code lengths","Corrupt JPEG");
+      }
+      // compute largest code + 1 for this size, preshifted as needed later
+      h->maxcode[j] = code << (16-j);
+      code <<= 1;
+   }
+   h->maxcode[j] = 0xffffffff;
+
+   // build non-spec acceleration table; 255 is flag for not-accelerated
+   memset(h->fast, 255, 1 << FAST_BITS);
+   for (i=0; i < k; ++i) {
+      int s = h->size[i];
+      if (s <= FAST_BITS) {
+         int c = h->code[i] << (FAST_BITS-s);
+         int m = 1 << (FAST_BITS-s);
+         for (j=0; j < m; ++j) {
+            h->fast[c+j] = (stbi_uc) i;
+         }
+      }
+   }
+   return 1;
+}
+
+// build a table that decodes both magnitude and value of small ACs in
+// one go.
+static void stbi__build_fast_ac(stbi__int16 *fast_ac, stbi__huffman *h)
+{
+   int i;
+   for (i=0; i < (1 << FAST_BITS); ++i) {
+      stbi_uc fast = h->fast[i];
+      fast_ac[i] = 0;
+      if (fast < 255) {
+         int rs = h->values[fast];
+         int run = (rs >> 4) & 15;
+         int magbits = rs & 15;
+         int len = h->size[fast];
+
+         if (magbits && len + magbits <= FAST_BITS) {
+            // magnitude code followed by receive_extend code
+            int k = ((i << len) & ((1 << FAST_BITS) - 1)) >> (FAST_BITS - magbits);
+            int m = 1 << (magbits - 1);
+            if (k < m) k += (~0U << magbits) + 1;
+            // if the result is small enough, we can fit it in fast_ac table
+            if (k >= -128 && k <= 127)
+               fast_ac[i] = (stbi__int16) ((k * 256) + (run * 16) + (len + magbits));
+         }
+      }
+   }
+}
+
+static void stbi__grow_buffer_unsafe(stbi__jpeg *j)
+{
+   do {
+      unsigned int b = j->nomore ? 0 : stbi__get8(j->s);
+      if (b == 0xff) {
+         int c = stbi__get8(j->s);
+         while (c == 0xff) c = stbi__get8(j->s); // consume fill bytes
+         if (c != 0) {
+            j->marker = (unsigned char) c;
+            j->nomore = 1;
+            return;
+         }
+      }
+      j->code_buffer |= b << (24 - j->code_bits);
+      j->code_bits += 8;
+   } while (j->code_bits <= 24);
+}
+
+// (1 << n) - 1
+static const stbi__uint32 stbi__bmask[17]={0,1,3,7,15,31,63,127,255,511,1023,2047,4095,8191,16383,32767,65535};
+
+// decode a jpeg huffman value from the bitstream
+stbi_inline static int stbi__jpeg_huff_decode(stbi__jpeg *j, stbi__huffman *h)
+{
+   unsigned int temp;
+   int c,k;
+
+   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+
+   // look at the top FAST_BITS and determine what symbol ID it is,
+   // if the code is <= FAST_BITS
+   c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+   k = h->fast[c];
+   if (k < 255) {
+      int s = h->size[k];
+      if (s > j->code_bits)
+         return -1;
+      j->code_buffer <<= s;
+      j->code_bits -= s;
+      return h->values[k];
+   }
+
+   // naive test is to shift the code_buffer down so k bits are
+   // valid, then test against maxcode. To speed this up, we've
+   // preshifted maxcode left so that it has (16-k) 0s at the
+   // end; in other words, regardless of the number of bits, it
+   // wants to be compared against something shifted to have 16;
+   // that way we don't need to shift inside the loop.
+   temp = j->code_buffer >> 16;
+   for (k=FAST_BITS+1 ; ; ++k)
+      if (temp < h->maxcode[k])
+         break;
+   if (k == 17) {
+      // error! code not found
+      j->code_bits -= 16;
+      return -1;
+   }
+
+   if (k > j->code_bits)
+      return -1;
+
+   // convert the huffman code to the symbol id
+   c = ((j->code_buffer >> (32 - k)) & stbi__bmask[k]) + h->delta[k];
+   STBI_ASSERT((((j->code_buffer) >> (32 - h->size[c])) & stbi__bmask[h->size[c]]) == h->code[c]);
+
+   // convert the id to a symbol
+   j->code_bits -= k;
+   j->code_buffer <<= k;
+   return h->values[c];
+}
+
+// bias[n] = (-1<<n) + 1
+static const int stbi__jbias[16] = {0,-1,-3,-7,-15,-31,-63,-127,-255,-511,-1023,-2047,-4095,-8191,-16383,-32767};
+
+// combined JPEG 'receive' and JPEG 'extend', since baseline
+// always extends everything it receives.
+stbi_inline static int stbi__extend_receive(stbi__jpeg *j, int n)
+{
+   unsigned int k;
+   int sgn;
+   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
+
+   sgn = j->code_buffer >> 31; // sign bit always in MSB; 0 if MSB clear (positive), 1 if MSB set (negative)
+   k = stbi_lrot(j->code_buffer, n);
+   j->code_buffer = k & ~stbi__bmask[n];
+   k &= stbi__bmask[n];
+   j->code_bits -= n;
+   return k + (stbi__jbias[n] & (sgn - 1));
+}
+
+// get some unsigned bits
+stbi_inline static int stbi__jpeg_get_bits(stbi__jpeg *j, int n)
+{
+   unsigned int k;
+   if (j->code_bits < n) stbi__grow_buffer_unsafe(j);
+   k = stbi_lrot(j->code_buffer, n);
+   j->code_buffer = k & ~stbi__bmask[n];
+   k &= stbi__bmask[n];
+   j->code_bits -= n;
+   return k;
+}
+
+stbi_inline static int stbi__jpeg_get_bit(stbi__jpeg *j)
+{
+   unsigned int k;
+   if (j->code_bits < 1) stbi__grow_buffer_unsafe(j);
+   k = j->code_buffer;
+   j->code_buffer <<= 1;
+   --j->code_bits;
+   return k & 0x80000000;
+}
+
+// given a value that's at position X in the zigzag stream,
+// where does it appear in the 8x8 matrix coded as row-major?
+static const stbi_uc stbi__jpeg_dezigzag[64+15] =
+{
+    0,  1,  8, 16,  9,  2,  3, 10,
+   17, 24, 32, 25, 18, 11,  4,  5,
+   12, 19, 26, 33, 40, 48, 41, 34,
+   27, 20, 13,  6,  7, 14, 21, 28,
+   35, 42, 49, 56, 57, 50, 43, 36,
+   29, 22, 15, 23, 30, 37, 44, 51,
+   58, 59, 52, 45, 38, 31, 39, 46,
+   53, 60, 61, 54, 47, 55, 62, 63,
+   // let corrupt input sample past end
+   63, 63, 63, 63, 63, 63, 63, 63,
+   63, 63, 63, 63, 63, 63, 63
+};
+
+// decode one 64-entry block--
+static int stbi__jpeg_decode_block(stbi__jpeg *j, short data[64], stbi__huffman *hdc, stbi__huffman *hac, stbi__int16 *fac, int b, stbi__uint16 *dequant)
+{
+   int diff,dc,k;
+   int t;
+
+   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+   t = stbi__jpeg_huff_decode(j, hdc);
+   if (t < 0 || t > 15) return stbi__err("bad huffman code","Corrupt JPEG");
+
+   // 0 all the ac values now so we can do it 32-bits at a time
+   memset(data,0,64*sizeof(data[0]));
+
+   diff = t ? stbi__extend_receive(j, t) : 0;
+   dc = j->img_comp[b].dc_pred + diff;
+   j->img_comp[b].dc_pred = dc;
+   data[0] = (short) (dc * dequant[0]);
+
+   // decode AC components, see JPEG spec
+   k = 1;
+   do {
+      unsigned int zig;
+      int c,r,s;
+      if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+      c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+      r = fac[c];
+      if (r) { // fast-AC path
+         k += (r >> 4) & 15; // run
+         s = r & 15; // combined length
+         j->code_buffer <<= s;
+         j->code_bits -= s;
+         // decode into unzigzag'd location
+         zig = stbi__jpeg_dezigzag[k++];
+         data[zig] = (short) ((r >> 8) * dequant[zig]);
+      } else {
+         int rs = stbi__jpeg_huff_decode(j, hac);
+         if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
+         s = rs & 15;
+         r = rs >> 4;
+         if (s == 0) {
+            if (rs != 0xf0) break; // end block
+            k += 16;
+         } else {
+            k += r;
+            // decode into unzigzag'd location
+            zig = stbi__jpeg_dezigzag[k++];
+            data[zig] = (short) (stbi__extend_receive(j,s) * dequant[zig]);
+         }
+      }
+   } while (k < 64);
+   return 1;
+}
+
+static int stbi__jpeg_decode_block_prog_dc(stbi__jpeg *j, short data[64], stbi__huffman *hdc, int b)
+{
+   int diff,dc;
+   int t;
+   if (j->spec_end != 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+
+   if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+
+   if (j->succ_high == 0) {
+      // first scan for DC coefficient, must be first
+      memset(data,0,64*sizeof(data[0])); // 0 all the ac values now
+      t = stbi__jpeg_huff_decode(j, hdc);
+      if (t < 0 || t > 15) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+      diff = t ? stbi__extend_receive(j, t) : 0;
+
+      dc = j->img_comp[b].dc_pred + diff;
+      j->img_comp[b].dc_pred = dc;
+      data[0] = (short) (dc * (1 << j->succ_low));
+   } else {
+      // refinement scan for DC coefficient
+      if (stbi__jpeg_get_bit(j))
+         data[0] += (short) (1 << j->succ_low);
+   }
+   return 1;
+}
+
+// @OPTIMIZE: store non-zigzagged during the decode passes,
+// and only de-zigzag when dequantizing
+static int stbi__jpeg_decode_block_prog_ac(stbi__jpeg *j, short data[64], stbi__huffman *hac, stbi__int16 *fac)
+{
+   int k;
+   if (j->spec_start == 0) return stbi__err("can't merge dc and ac", "Corrupt JPEG");
+
+   if (j->succ_high == 0) {
+      int shift = j->succ_low;
+
+      if (j->eob_run) {
+         --j->eob_run;
+         return 1;
+      }
+
+      k = j->spec_start;
+      do {
+         unsigned int zig;
+         int c,r,s;
+         if (j->code_bits < 16) stbi__grow_buffer_unsafe(j);
+         c = (j->code_buffer >> (32 - FAST_BITS)) & ((1 << FAST_BITS)-1);
+         r = fac[c];
+         if (r) { // fast-AC path
+            k += (r >> 4) & 15; // run
+            s = r & 15; // combined length
+            j->code_buffer <<= s;
+            j->code_bits -= s;
+            zig = stbi__jpeg_dezigzag[k++];
+            data[zig] = (short) ((r >> 8) * (1 << shift));
+         } else {
+            int rs = stbi__jpeg_huff_decode(j, hac);
+            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
+            s = rs & 15;
+            r = rs >> 4;
+            if (s == 0) {
+               if (r < 15) {
+                  j->eob_run = (1 << r);
+                  if (r)
+                     j->eob_run += stbi__jpeg_get_bits(j, r);
+                  --j->eob_run;
+                  break;
+               }
+               k += 16;
+            } else {
+               k += r;
+               zig = stbi__jpeg_dezigzag[k++];
+               data[zig] = (short) (stbi__extend_receive(j,s) * (1 << shift));
+            }
+         }
+      } while (k <= j->spec_end);
+   } else {
+      // refinement scan for these AC coefficients
+
+      short bit = (short) (1 << j->succ_low);
+
+      if (j->eob_run) {
+         --j->eob_run;
+         for (k = j->spec_start; k <= j->spec_end; ++k) {
+            short *p = &data[stbi__jpeg_dezigzag[k]];
+            if (*p != 0)
+               if (stbi__jpeg_get_bit(j))
+                  if ((*p & bit)==0) {
+                     if (*p > 0)
+                        *p += bit;
+                     else
+                        *p -= bit;
+                  }
+         }
+      } else {
+         k = j->spec_start;
+         do {
+            int r,s;
+            int rs = stbi__jpeg_huff_decode(j, hac); // @OPTIMIZE see if we can use the fast path here, advance-by-r is so slow, eh
+            if (rs < 0) return stbi__err("bad huffman code","Corrupt JPEG");
+            s = rs & 15;
+            r = rs >> 4;
+            if (s == 0) {
+               if (r < 15) {
+                  j->eob_run = (1 << r) - 1;
+                  if (r)
+                     j->eob_run += stbi__jpeg_get_bits(j, r);
+                  r = 64; // force end of block
+               } else {
+                  // r=15 s=0 should write 16 0s, so we just do
+                  // a run of 15 0s and then write s (which is 0),
+                  // so we don't have to do anything special here
+               }
+            } else {
+               if (s != 1) return stbi__err("bad huffman code", "Corrupt JPEG");
+               // sign bit
+               if (stbi__jpeg_get_bit(j))
+                  s = bit;
+               else
+                  s = -bit;
+            }
+
+            // advance by r
+            while (k <= j->spec_end) {
+               short *p = &data[stbi__jpeg_dezigzag[k++]];
+               if (*p != 0) {
+                  if (stbi__jpeg_get_bit(j))
+                     if ((*p & bit)==0) {
+                        if (*p > 0)
+                           *p += bit;
+                        else
+                           *p -= bit;
+                     }
+               } else {
+                  if (r == 0) {
+                     *p = (short) s;
+                     break;
+                  }
+                  --r;
+               }
+            }
+         } while (k <= j->spec_end);
+      }
+   }
+   return 1;
+}
+
+// take a -128..127 value and stbi__clamp it and convert to 0..255
+stbi_inline static stbi_uc stbi__clamp(int x)
+{
+   // trick to use a single test to catch both cases
+   if ((unsigned int) x > 255) {
+      if (x < 0) return 0;
+      if (x > 255) return 255;
+   }
+   return (stbi_uc) x;
+}
+
+#define stbi__f2f(x)  ((int) (((x) * 4096 + 0.5)))
+#define stbi__fsh(x)  ((x) * 4096)
+
+// derived from jidctint -- DCT_ISLOW
+#define STBI__IDCT_1D(s0,s1,s2,s3,s4,s5,s6,s7) \
+   int t0,t1,t2,t3,p1,p2,p3,p4,p5,x0,x1,x2,x3; \
+   p2 = s2;                                    \
+   p3 = s6;                                    \
+   p1 = (p2+p3) * stbi__f2f(0.5411961f);       \
+   t2 = p1 + p3*stbi__f2f(-1.847759065f);      \
+   t3 = p1 + p2*stbi__f2f( 0.765366865f);      \
+   p2 = s0;                                    \
+   p3 = s4;                                    \
+   t0 = stbi__fsh(p2+p3);                      \
+   t1 = stbi__fsh(p2-p3);                      \
+   x0 = t0+t3;                                 \
+   x3 = t0-t3;                                 \
+   x1 = t1+t2;                                 \
+   x2 = t1-t2;                                 \
+   t0 = s7;                                    \
+   t1 = s5;                                    \
+   t2 = s3;                                    \
+   t3 = s1;                                    \
+   p3 = t0+t2;                                 \
+   p4 = t1+t3;                                 \
+   p1 = t0+t3;                                 \
+   p2 = t1+t2;                                 \
+   p5 = (p3+p4)*stbi__f2f( 1.175875602f);      \
+   t0 = t0*stbi__f2f( 0.298631336f);           \
+   t1 = t1*stbi__f2f( 2.053119869f);           \
+   t2 = t2*stbi__f2f( 3.072711026f);           \
+   t3 = t3*stbi__f2f( 1.501321110f);           \
+   p1 = p5 + p1*stbi__f2f(-0.899976223f);      \
+   p2 = p5 + p2*stbi__f2f(-2.562915447f);      \
+   p3 = p3*stbi__f2f(-1.961570560f);           \
+   p4 = p4*stbi__f2f(-0.390180644f);           \
+   t3 += p1+p4;                                \
+   t2 += p2+p3;                                \
+   t1 += p2+p4;                                \
+   t0 += p1+p3;
+
+static void stbi__idct_block(stbi_uc *out, int out_stride, short data[64])
+{
+   int i,val[64],*v=val;
+   stbi_uc *o;
+   short *d = data;
+
+   // columns
+   for (i=0; i < 8; ++i,++d, ++v) {
+      // if all zeroes, shortcut -- this avoids dequantizing 0s and IDCTing
+      if (d[ 8]==0 && d[16]==0 && d[24]==0 && d[32]==0
+           && d[40]==0 && d[48]==0 && d[56]==0) {
+         //    no shortcut                 0     seconds
+         //    (1|2|3|4|5|6|7)==0          0     seconds
+         //    all separate               -0.047 seconds
+         //    1 && 2|3 && 4|5 && 6|7:    -0.047 seconds
+         int dcterm = d[0]*4;
+         v[0] = v[8] = v[16] = v[24] = v[32] = v[40] = v[48] = v[56] = dcterm;
+      } else {
+         STBI__IDCT_1D(d[ 0],d[ 8],d[16],d[24],d[32],d[40],d[48],d[56])
+         // constants scaled things up by 1<<12; let's bring them back
+         // down, but keep 2 extra bits of precision
+         x0 += 512; x1 += 512; x2 += 512; x3 += 512;
+         v[ 0] = (x0+t3) >> 10;
+         v[56] = (x0-t3) >> 10;
+         v[ 8] = (x1+t2) >> 10;
+         v[48] = (x1-t2) >> 10;
+         v[16] = (x2+t1) >> 10;
+         v[40] = (x2-t1) >> 10;
+         v[24] = (x3+t0) >> 10;
+         v[32] = (x3-t0) >> 10;
+      }
+   }
+
+   for (i=0, v=val, o=out; i < 8; ++i,v+=8,o+=out_stride) {
+      // no fast case since the first 1D IDCT spread components out
+      STBI__IDCT_1D(v[0],v[1],v[2],v[3],v[4],v[5],v[6],v[7])
+      // constants scaled things up by 1<<12, plus we had 1<<2 from first
+      // loop, plus horizontal and vertical each scale by sqrt(8) so together
+      // we've got an extra 1<<3, so 1<<17 total we need to remove.
+      // so we want to round that, which means adding 0.5 * 1<<17,
+      // aka 65536. Also, we'll end up with -128 to 127 that we want
+      // to encode as 0..255 by adding 128, so we'll add that before the shift
+      x0 += 65536 + (128<<17);
+      x1 += 65536 + (128<<17);
+      x2 += 65536 + (128<<17);
+      x3 += 65536 + (128<<17);
+      // tried computing the shifts into temps, or'ing the temps to see
+      // if any were out of range, but that was slower
+      o[0] = stbi__clamp((x0+t3) >> 17);
+      o[7] = stbi__clamp((x0-t3) >> 17);
+      o[1] = stbi__clamp((x1+t2) >> 17);
+      o[6] = stbi__clamp((x1-t2) >> 17);
+      o[2] = stbi__clamp((x2+t1) >> 17);
+      o[5] = stbi__clamp((x2-t1) >> 17);
+      o[3] = stbi__clamp((x3+t0) >> 17);
+      o[4] = stbi__clamp((x3-t0) >> 17);
+   }
+}
+
+#ifdef STBI_SSE2
+// sse2 integer IDCT. not the fastest possible implementation but it
+// produces bit-identical results to the generic C version so it's
+// fully "transparent".
+static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
+{
+   // This is constructed to match our regular (generic) integer IDCT exactly.
+   __m128i row0, row1, row2, row3, row4, row5, row6, row7;
+   __m128i tmp;
+
+   // dot product constant: even elems=x, odd elems=y
+   #define dct_const(x,y)  _mm_setr_epi16((x),(y),(x),(y),(x),(y),(x),(y))
+
+   // out(0) = c0[even]*x + c0[odd]*y   (c0, x, y 16-bit, out 32-bit)
+   // out(1) = c1[even]*x + c1[odd]*y
+   #define dct_rot(out0,out1, x,y,c0,c1) \
+      __m128i c0##lo = _mm_unpacklo_epi16((x),(y)); \
+      __m128i c0##hi = _mm_unpackhi_epi16((x),(y)); \
+      __m128i out0##_l = _mm_madd_epi16(c0##lo, c0); \
+      __m128i out0##_h = _mm_madd_epi16(c0##hi, c0); \
+      __m128i out1##_l = _mm_madd_epi16(c0##lo, c1); \
+      __m128i out1##_h = _mm_madd_epi16(c0##hi, c1)
+
+   // out = in << 12  (in 16-bit, out 32-bit)
+   #define dct_widen(out, in) \
+      __m128i out##_l = _mm_srai_epi32(_mm_unpacklo_epi16(_mm_setzero_si128(), (in)), 4); \
+      __m128i out##_h = _mm_srai_epi32(_mm_unpackhi_epi16(_mm_setzero_si128(), (in)), 4)
+
+   // wide add
+   #define dct_wadd(out, a, b) \
+      __m128i out##_l = _mm_add_epi32(a##_l, b##_l); \
+      __m128i out##_h = _mm_add_epi32(a##_h, b##_h)
+
+   // wide sub
+   #define dct_wsub(out, a, b) \
+      __m128i out##_l = _mm_sub_epi32(a##_l, b##_l); \
+      __m128i out##_h = _mm_sub_epi32(a##_h, b##_h)
+
+   // butterfly a/b, add bias, then shift by "s" and pack
+   #define dct_bfly32o(out0, out1, a,b,bias,s) \
+      { \
+         __m128i abiased_l = _mm_add_epi32(a##_l, bias); \
+         __m128i abiased_h = _mm_add_epi32(a##_h, bias); \
+         dct_wadd(sum, abiased, b); \
+         dct_wsub(dif, abiased, b); \
+         out0 = _mm_packs_epi32(_mm_srai_epi32(sum_l, s), _mm_srai_epi32(sum_h, s)); \
+         out1 = _mm_packs_epi32(_mm_srai_epi32(dif_l, s), _mm_srai_epi32(dif_h, s)); \
+      }
+
+   // 8-bit interleave step (for transposes)
+   #define dct_interleave8(a, b) \
+      tmp = a; \
+      a = _mm_unpacklo_epi8(a, b); \
+      b = _mm_unpackhi_epi8(tmp, b)
+
+   // 16-bit interleave step (for transposes)
+   #define dct_interleave16(a, b) \
+      tmp = a; \
+      a = _mm_unpacklo_epi16(a, b); \
+      b = _mm_unpackhi_epi16(tmp, b)
+
+   #define dct_pass(bias,shift) \
+      { \
+         /* even part */ \
+         dct_rot(t2e,t3e, row2,row6, rot0_0,rot0_1); \
+         __m128i sum04 = _mm_add_epi16(row0, row4); \
+         __m128i dif04 = _mm_sub_epi16(row0, row4); \
+         dct_widen(t0e, sum04); \
+         dct_widen(t1e, dif04); \
+         dct_wadd(x0, t0e, t3e); \
+         dct_wsub(x3, t0e, t3e); \
+         dct_wadd(x1, t1e, t2e); \
+         dct_wsub(x2, t1e, t2e); \
+         /* odd part */ \
+         dct_rot(y0o,y2o, row7,row3, rot2_0,rot2_1); \
+         dct_rot(y1o,y3o, row5,row1, rot3_0,rot3_1); \
+         __m128i sum17 = _mm_add_epi16(row1, row7); \
+         __m128i sum35 = _mm_add_epi16(row3, row5); \
+         dct_rot(y4o,y5o, sum17,sum35, rot1_0,rot1_1); \
+         dct_wadd(x4, y0o, y4o); \
+         dct_wadd(x5, y1o, y5o); \
+         dct_wadd(x6, y2o, y5o); \
+         dct_wadd(x7, y3o, y4o); \
+         dct_bfly32o(row0,row7, x0,x7,bias,shift); \
+         dct_bfly32o(row1,row6, x1,x6,bias,shift); \
+         dct_bfly32o(row2,row5, x2,x5,bias,shift); \
+         dct_bfly32o(row3,row4, x3,x4,bias,shift); \
+      }
+
+   __m128i rot0_0 = dct_const(stbi__f2f(0.5411961f), stbi__f2f(0.5411961f) + stbi__f2f(-1.847759065f));
+   __m128i rot0_1 = dct_const(stbi__f2f(0.5411961f) + stbi__f2f( 0.765366865f), stbi__f2f(0.5411961f));
+   __m128i rot1_0 = dct_const(stbi__f2f(1.175875602f) + stbi__f2f(-0.899976223f), stbi__f2f(1.175875602f));
+   __m128i rot1_1 = dct_const(stbi__f2f(1.175875602f), stbi__f2f(1.175875602f) + stbi__f2f(-2.562915447f));
+   __m128i rot2_0 = dct_const(stbi__f2f(-1.961570560f) + stbi__f2f( 0.298631336f), stbi__f2f(-1.961570560f));
+   __m128i rot2_1 = dct_const(stbi__f2f(-1.961570560f), stbi__f2f(-1.961570560f) + stbi__f2f( 3.072711026f));
+   __m128i rot3_0 = dct_const(stbi__f2f(-0.390180644f) + stbi__f2f( 2.053119869f), stbi__f2f(-0.390180644f));
+   __m128i rot3_1 = dct_const(stbi__f2f(-0.390180644f), stbi__f2f(-0.390180644f) + stbi__f2f( 1.501321110f));
+
+   // rounding biases in column/row passes, see stbi__idct_block for explanation.
+   __m128i bias_0 = _mm_set1_epi32(512);
+   __m128i bias_1 = _mm_set1_epi32(65536 + (128<<17));
+
+   // load
+   row0 = _mm_load_si128((const __m128i *) (data + 0*8));
+   row1 = _mm_load_si128((const __m128i *) (data + 1*8));
+   row2 = _mm_load_si128((const __m128i *) (data + 2*8));
+   row3 = _mm_load_si128((const __m128i *) (data + 3*8));
+   row4 = _mm_load_si128((const __m128i *) (data + 4*8));
+   row5 = _mm_load_si128((const __m128i *) (data + 5*8));
+   row6 = _mm_load_si128((const __m128i *) (data + 6*8));
+   row7 = _mm_load_si128((const __m128i *) (data + 7*8));
+
+   // column pass
+   dct_pass(bias_0, 10);
+
+   {
+      // 16bit 8x8 transpose pass 1
+      dct_interleave16(row0, row4);
+      dct_interleave16(row1, row5);
+      dct_interleave16(row2, row6);
+      dct_interleave16(row3, row7);
+
+      // transpose pass 2
+      dct_interleave16(row0, row2);
+      dct_interleave16(row1, row3);
+      dct_interleave16(row4, row6);
+      dct_interleave16(row5, row7);
+
+      // transpose pass 3
+      dct_interleave16(row0, row1);
+      dct_interleave16(row2, row3);
+      dct_interleave16(row4, row5);
+      dct_interleave16(row6, row7);
+   }
+
+   // row pass
+   dct_pass(bias_1, 17);
+
+   {
+      // pack
+      __m128i p0 = _mm_packus_epi16(row0, row1); // a0a1a2a3...a7b0b1b2b3...b7
+      __m128i p1 = _mm_packus_epi16(row2, row3);
+      __m128i p2 = _mm_packus_epi16(row4, row5);
+      __m128i p3 = _mm_packus_epi16(row6, row7);
+
+      // 8bit 8x8 transpose pass 1
+      dct_interleave8(p0, p2); // a0e0a1e1...
+      dct_interleave8(p1, p3); // c0g0c1g1...
+
+      // transpose pass 2
+      dct_interleave8(p0, p1); // a0c0e0g0...
+      dct_interleave8(p2, p3); // b0d0f0h0...
+
+      // transpose pass 3
+      dct_interleave8(p0, p2); // a0b0c0d0...
+      dct_interleave8(p1, p3); // a4b4c4d4...
+
+      // store
+      _mm_storel_epi64((__m128i *) out, p0); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p0, 0x4e)); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, p2); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p2, 0x4e)); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, p1); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p1, 0x4e)); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, p3); out += out_stride;
+      _mm_storel_epi64((__m128i *) out, _mm_shuffle_epi32(p3, 0x4e));
+   }
+
+#undef dct_const
+#undef dct_rot
+#undef dct_widen
+#undef dct_wadd
+#undef dct_wsub
+#undef dct_bfly32o
+#undef dct_interleave8
+#undef dct_interleave16
+#undef dct_pass
+}
+
+#endif // STBI_SSE2
+
+#ifdef STBI_NEON
+
+// NEON integer IDCT. should produce bit-identical
+// results to the generic C version.
+static void stbi__idct_simd(stbi_uc *out, int out_stride, short data[64])
+{
+   int16x8_t row0, row1, row2, row3, row4, row5, row6, row7;
+
+   int16x4_t rot0_0 = vdup_n_s16(stbi__f2f(0.5411961f));
+   int16x4_t rot0_1 = vdup_n_s16(stbi__f2f(-1.847759065f));
+   int16x4_t rot0_2 = vdup_n_s16(stbi__f2f( 0.765366865f));
+   int16x4_t rot1_0 = vdup_n_s16(stbi__f2f( 1.175875602f));
+   int16x4_t rot1_1 = vdup_n_s16(stbi__f2f(-0.899976223f));
+   int16x4_t rot1_2 = vdup_n_s16(stbi__f2f(-2.562915447f));
+   int16x4_t rot2_0 = vdup_n_s16(stbi__f2f(-1.961570560f));
+   int16x4_t rot2_1 = vdup_n_s16(stbi__f2f(-0.390180644f));
+   int16x4_t rot3_0 = vdup_n_s16(stbi__f2f( 0.298631336f));
+   int16x4_t rot3_1 = vdup_n_s16(stbi__f2f( 2.053119869f));
+   int16x4_t rot3_2 = vdup_n_s16(stbi__f2f( 3.072711026f));
+   int16x4_t rot3_3 = vdup_n_s16(stbi__f2f( 1.501321110f));
+
+#define dct_long_mul(out, inq, coeff) \
+   int32x4_t out##_l = vmull_s16(vget_low_s16(inq), coeff); \
+   int32x4_t out##_h = vmull_s16(vget_high_s16(inq), coeff)
+
+#define dct_long_mac(out, acc, inq, coeff) \
+   int32x4_t out##_l = vmlal_s16(acc##_l, vget_low_s16(inq), coeff); \
+   int32x4_t out##_h = vmlal_s16(acc##_h, vget_high_s16(inq), coeff)
+
+#define dct_widen(out, inq) \
+   int32x4_t out##_l = vshll_n_s16(vget_low_s16(inq), 12); \
+   int32x4_t out##_h = vshll_n_s16(vget_high_s16(inq), 12)
+
+// wide add
+#define dct_wadd(out, a, b) \
+   int32x4_t out##_l = vaddq_s32(a##_l, b##_l); \
+   int32x4_t out##_h = vaddq_s32(a##_h, b##_h)
+
+// wide sub
+#define dct_wsub(out, a, b) \
+   int32x4_t out##_l = vsubq_s32(a##_l, b##_l); \
+   int32x4_t out##_h = vsubq_s32(a##_h, b##_h)
+
+// butterfly a/b, then shift using "shiftop" by "s" and pack
+#define dct_bfly32o(out0,out1, a,b,shiftop,s) \
+   { \
+      dct_wadd(sum, a, b); \
+      dct_wsub(dif, a, b); \
+      out0 = vcombine_s16(shiftop(sum_l, s), shiftop(sum_h, s)); \
+      out1 = vcombine_s16(shiftop(dif_l, s), shiftop(dif_h, s)); \
+   }
+
+#define dct_pass(shiftop, shift) \
+   { \
+      /* even part */ \
+      int16x8_t sum26 = vaddq_s16(row2, row6); \
+      dct_long_mul(p1e, sum26, rot0_0); \
+      dct_long_mac(t2e, p1e, row6, rot0_1); \
+      dct_long_mac(t3e, p1e, row2, rot0_2); \
+      int16x8_t sum04 = vaddq_s16(row0, row4); \
+      int16x8_t dif04 = vsubq_s16(row0, row4); \
+      dct_widen(t0e, sum04); \
+      dct_widen(t1e, dif04); \
+      dct_wadd(x0, t0e, t3e); \
+      dct_wsub(x3, t0e, t3e); \
+      dct_wadd(x1, t1e, t2e); \
+      dct_wsub(x2, t1e, t2e); \
+      /* odd part */ \
+      int16x8_t sum15 = vaddq_s16(row1, row5); \
+      int16x8_t sum17 = vaddq_s16(row1, row7); \
+      int16x8_t sum35 = vaddq_s16(row3, row5); \
+      int16x8_t sum37 = vaddq_s16(row3, row7); \
+      int16x8_t sumodd = vaddq_s16(sum17, sum35); \
+      dct_long_mul(p5o, sumodd, rot1_0); \
+      dct_long_mac(p1o, p5o, sum17, rot1_1); \
+      dct_long_mac(p2o, p5o, sum35, rot1_2); \
+      dct_long_mul(p3o, sum37, rot2_0); \
+      dct_long_mul(p4o, sum15, rot2_1); \
+      dct_wadd(sump13o, p1o, p3o); \
+      dct_wadd(sump24o, p2o, p4o); \
+      dct_wadd(sump23o, p2o, p3o); \
+      dct_wadd(sump14o, p1o, p4o); \
+      dct_long_mac(x4, sump13o, row7, rot3_0); \
+      dct_long_mac(x5, sump24o, row5, rot3_1); \
+      dct_long_mac(x6, sump23o, row3, rot3_2); \
+      dct_long_mac(x7, sump14o, row1, rot3_3); \
+      dct_bfly32o(row0,row7, x0,x7,shiftop,shift); \
+      dct_bfly32o(row1,row6, x1,x6,shiftop,shift); \
+      dct_bfly32o(row2,row5, x2,x5,shiftop,shift); \
+      dct_bfly32o(row3,row4, x3,x4,shiftop,shift); \
+   }
+
+   // load
+   row0 = vld1q_s16(data + 0*8);
+   row1 = vld1q_s16(data + 1*8);
+   row2 = vld1q_s16(data + 2*8);
+   row3 = vld1q_s16(data + 3*8);
+   row4 = vld1q_s16(data + 4*8);
+   row5 = vld1q_s16(data + 5*8);
+   row6 = vld1q_s16(data + 6*8);
+   row7 = vld1q_s16(data + 7*8);
+
+   // add DC bias
+   row0 = vaddq_s16(row0, vsetq_lane_s16(1024, vdupq_n_s16(0), 0));
+
+   // column pass
+   dct_pass(vrshrn_n_s32, 10);
+
+   // 16bit 8x8 transpose
+   {
+// these three map to a single VTRN.16, VTRN.32, and VSWP, respectively.
+// whether compilers actually get this is another story, sadly.
+#define dct_trn16(x, y) { int16x8x2_t t = vtrnq_s16(x, y); x = t.val[0]; y = t.val[1]; }
+#define dct_trn32(x, y) { int32x4x2_t t = vtrnq_s32(vreinterpretq_s32_s16(x), vreinterpretq_s32_s16(y)); x = vreinterpretq_s16_s32(t.val[0]); y = vreinterpretq_s16_s32(t.val[1]); }
+#define dct_trn64(x, y) { int16x8_t x0 = x; int16x8_t y0 = y; x = vcombine_s16(vget_low_s16(x0), vget_low_s16(y0)); y = vcombine_s16(vget_high_s16(x0), vget_high_s16(y0)); }
+
+      // pass 1
+      dct_trn16(row0, row1); // a0b0a2b2a4b4a6b6
+      dct_trn16(row2, row3);
+      dct_trn16(row4, row5);
+      dct_trn16(row6, row7);
+
+      // pass 2
+      dct_trn32(row0, row2); // a0b0c0d0a4b4c4d4
+      dct_trn32(row1, row3);
+      dct_trn32(row4, row6);
+      dct_trn32(row5, row7);
+
+      // pass 3
+      dct_trn64(row0, row4); // a0b0c0d0e0f0g0h0
+      dct_trn64(row1, row5);
+      dct_trn64(row2, row6);
+      dct_trn64(row3, row7);
+
+#undef dct_trn16
+#undef dct_trn32
+#undef dct_trn64
+   }
+
+   // row pass
+   // vrshrn_n_s32 only supports shifts up to 16, we need
+   // 17. so do a non-rounding shift of 16 first then follow
+   // up with a rounding shift by 1.
+   dct_pass(vshrn_n_s32, 16);
+
+   {
+      // pack and round
+      uint8x8_t p0 = vqrshrun_n_s16(row0, 1);
+      uint8x8_t p1 = vqrshrun_n_s16(row1, 1);
+      uint8x8_t p2 = vqrshrun_n_s16(row2, 1);
+      uint8x8_t p3 = vqrshrun_n_s16(row3, 1);
+      uint8x8_t p4 = vqrshrun_n_s16(row4, 1);
+      uint8x8_t p5 = vqrshrun_n_s16(row5, 1);
+      uint8x8_t p6 = vqrshrun_n_s16(row6, 1);
+      uint8x8_t p7 = vqrshrun_n_s16(row7, 1);
+
+      // again, these can translate into one instruction, but often don't.
+#define dct_trn8_8(x, y) { uint8x8x2_t t = vtrn_u8(x, y); x = t.val[0]; y = t.val[1]; }
+#define dct_trn8_16(x, y) { uint16x4x2_t t = vtrn_u16(vreinterpret_u16_u8(x), vreinterpret_u16_u8(y)); x = vreinterpret_u8_u16(t.val[0]); y = vreinterpret_u8_u16(t.val[1]); }
+#define dct_trn8_32(x, y) { uint32x2x2_t t = vtrn_u32(vreinterpret_u32_u8(x), vreinterpret_u32_u8(y)); x = vreinterpret_u8_u32(t.val[0]); y = vreinterpret_u8_u32(t.val[1]); }
+
+      // sadly can't use interleaved stores here since we only write
+      // 8 bytes to each scan line!
+
+      // 8x8 8-bit transpose pass 1
+      dct_trn8_8(p0, p1);
+      dct_trn8_8(p2, p3);
+      dct_trn8_8(p4, p5);
+      dct_trn8_8(p6, p7);
+
+      // pass 2
+      dct_trn8_16(p0, p2);
+      dct_trn8_16(p1, p3);
+      dct_trn8_16(p4, p6);
+      dct_trn8_16(p5, p7);
+
+      // pass 3
+      dct_trn8_32(p0, p4);
+      dct_trn8_32(p1, p5);
+      dct_trn8_32(p2, p6);
+      dct_trn8_32(p3, p7);
+
+      // store
+      vst1_u8(out, p0); out += out_stride;
+      vst1_u8(out, p1); out += out_stride;
+      vst1_u8(out, p2); out += out_stride;
+      vst1_u8(out, p3); out += out_stride;
+      vst1_u8(out, p4); out += out_stride;
+      vst1_u8(out, p5); out += out_stride;
+      vst1_u8(out, p6); out += out_stride;
+      vst1_u8(out, p7);
+
+#undef dct_trn8_8
+#undef dct_trn8_16
+#undef dct_trn8_32
+   }
+
+#undef dct_long_mul
+#undef dct_long_mac
+#undef dct_widen
+#undef dct_wadd
+#undef dct_wsub
+#undef dct_bfly32o
+#undef dct_pass
+}
+
+#endif // STBI_NEON
+
+#define STBI__MARKER_none  0xff
+// if there's a pending marker from the entropy stream, return that
+// otherwise, fetch from the stream and get a marker. if there's no
+// marker, return 0xff, which is never a valid marker value
+static stbi_uc stbi__get_marker(stbi__jpeg *j)
+{
+   stbi_uc x;
+   if (j->marker != STBI__MARKER_none) { x = j->marker; j->marker = STBI__MARKER_none; return x; }
+   x = stbi__get8(j->s);
+   if (x != 0xff) return STBI__MARKER_none;
+   while (x == 0xff)
+      x = stbi__get8(j->s); // consume repeated 0xff fill bytes
+   return x;
+}
+
+// in each scan, we'll have scan_n components, and the order
+// of the components is specified by order[]
+#define STBI__RESTART(x)     ((x) >= 0xd0 && (x) <= 0xd7)
+
+// after a restart interval, stbi__jpeg_reset the entropy decoder and
+// the dc prediction
+static void stbi__jpeg_reset(stbi__jpeg *j)
+{
+   j->code_bits = 0;
+   j->code_buffer = 0;
+   j->nomore = 0;
+   j->img_comp[0].dc_pred = j->img_comp[1].dc_pred = j->img_comp[2].dc_pred = j->img_comp[3].dc_pred = 0;
+   j->marker = STBI__MARKER_none;
+   j->todo = j->restart_interval ? j->restart_interval : 0x7fffffff;
+   j->eob_run = 0;
+   // no more than 1<<31 MCUs if no restart_interal? that's plenty safe,
+   // since we don't even allow 1<<30 pixels
+}
+
+static int stbi__parse_entropy_coded_data(stbi__jpeg *z)
+{
+   stbi__jpeg_reset(z);
+   if (!z->progressive) {
+      if (z->scan_n == 1) {
+         int i,j;
+         STBI_SIMD_ALIGN(short, data[64]);
+         int n = z->order[0];
+         // non-interleaved data, we just need to process one block at a time,
+         // in trivial scanline order
+         // number of blocks to do just depends on how many actual "pixels" this
+         // component has, independent of interleaved MCU blocking and such
+         int w = (z->img_comp[n].x+7) >> 3;
+         int h = (z->img_comp[n].y+7) >> 3;
+         for (j=0; j < h; ++j) {
+            for (i=0; i < w; ++i) {
+               int ha = z->img_comp[n].ha;
+               if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
+               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
+               // every data block is an MCU, so countdown the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  // if it's NOT a restart, then just bail, so we get corrupt data
+                  // rather than no data
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      } else { // interleaved
+         int i,j,k,x,y;
+         STBI_SIMD_ALIGN(short, data[64]);
+         for (j=0; j < z->img_mcu_y; ++j) {
+            for (i=0; i < z->img_mcu_x; ++i) {
+               // scan an interleaved mcu... process scan_n components in order
+               for (k=0; k < z->scan_n; ++k) {
+                  int n = z->order[k];
+                  // scan out an mcu's worth of this component; that's just determined
+                  // by the basic H and V specified for the component
+                  for (y=0; y < z->img_comp[n].v; ++y) {
+                     for (x=0; x < z->img_comp[n].h; ++x) {
+                        int x2 = (i*z->img_comp[n].h + x)*8;
+                        int y2 = (j*z->img_comp[n].v + y)*8;
+                        int ha = z->img_comp[n].ha;
+                        if (!stbi__jpeg_decode_block(z, data, z->huff_dc+z->img_comp[n].hd, z->huff_ac+ha, z->fast_ac[ha], n, z->dequant[z->img_comp[n].tq])) return 0;
+                        z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*y2+x2, z->img_comp[n].w2, data);
+                     }
+                  }
+               }
+               // after all interleaved components, that's an interleaved MCU,
+               // so now count down the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      }
+   } else {
+      if (z->scan_n == 1) {
+         int i,j;
+         int n = z->order[0];
+         // non-interleaved data, we just need to process one block at a time,
+         // in trivial scanline order
+         // number of blocks to do just depends on how many actual "pixels" this
+         // component has, independent of interleaved MCU blocking and such
+         int w = (z->img_comp[n].x+7) >> 3;
+         int h = (z->img_comp[n].y+7) >> 3;
+         for (j=0; j < h; ++j) {
+            for (i=0; i < w; ++i) {
+               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
+               if (z->spec_start == 0) {
+                  if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
+                     return 0;
+               } else {
+                  int ha = z->img_comp[n].ha;
+                  if (!stbi__jpeg_decode_block_prog_ac(z, data, &z->huff_ac[ha], z->fast_ac[ha]))
+                     return 0;
+               }
+               // every data block is an MCU, so countdown the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      } else { // interleaved
+         int i,j,k,x,y;
+         for (j=0; j < z->img_mcu_y; ++j) {
+            for (i=0; i < z->img_mcu_x; ++i) {
+               // scan an interleaved mcu... process scan_n components in order
+               for (k=0; k < z->scan_n; ++k) {
+                  int n = z->order[k];
+                  // scan out an mcu's worth of this component; that's just determined
+                  // by the basic H and V specified for the component
+                  for (y=0; y < z->img_comp[n].v; ++y) {
+                     for (x=0; x < z->img_comp[n].h; ++x) {
+                        int x2 = (i*z->img_comp[n].h + x);
+                        int y2 = (j*z->img_comp[n].v + y);
+                        short *data = z->img_comp[n].coeff + 64 * (x2 + y2 * z->img_comp[n].coeff_w);
+                        if (!stbi__jpeg_decode_block_prog_dc(z, data, &z->huff_dc[z->img_comp[n].hd], n))
+                           return 0;
+                     }
+                  }
+               }
+               // after all interleaved components, that's an interleaved MCU,
+               // so now count down the restart interval
+               if (--z->todo <= 0) {
+                  if (z->code_bits < 24) stbi__grow_buffer_unsafe(z);
+                  if (!STBI__RESTART(z->marker)) return 1;
+                  stbi__jpeg_reset(z);
+               }
+            }
+         }
+         return 1;
+      }
+   }
+}
+
+static void stbi__jpeg_dequantize(short *data, stbi__uint16 *dequant)
+{
+   int i;
+   for (i=0; i < 64; ++i)
+      data[i] *= dequant[i];
+}
+
+static void stbi__jpeg_finish(stbi__jpeg *z)
+{
+   if (z->progressive) {
+      // dequantize and idct the data
+      int i,j,n;
+      for (n=0; n < z->s->img_n; ++n) {
+         int w = (z->img_comp[n].x+7) >> 3;
+         int h = (z->img_comp[n].y+7) >> 3;
+         for (j=0; j < h; ++j) {
+            for (i=0; i < w; ++i) {
+               short *data = z->img_comp[n].coeff + 64 * (i + j * z->img_comp[n].coeff_w);
+               stbi__jpeg_dequantize(data, z->dequant[z->img_comp[n].tq]);
+               z->idct_block_kernel(z->img_comp[n].data+z->img_comp[n].w2*j*8+i*8, z->img_comp[n].w2, data);
+            }
+         }
+      }
+   }
+}
+
+static int stbi__process_marker(stbi__jpeg *z, int m)
+{
+   int L;
+   switch (m) {
+      case STBI__MARKER_none: // no marker found
+         return stbi__err("expected marker","Corrupt JPEG");
+
+      case 0xDD: // DRI - specify restart interval
+         if (stbi__get16be(z->s) != 4) return stbi__err("bad DRI len","Corrupt JPEG");
+         z->restart_interval = stbi__get16be(z->s);
+         return 1;
+
+      case 0xDB: // DQT - define quantization table
+         L = stbi__get16be(z->s)-2;
+         while (L > 0) {
+            int q = stbi__get8(z->s);
+            int p = q >> 4, sixteen = (p != 0);
+            int t = q & 15,i;
+            if (p != 0 && p != 1) return stbi__err("bad DQT type","Corrupt JPEG");
+            if (t > 3) return stbi__err("bad DQT table","Corrupt JPEG");
+
+            for (i=0; i < 64; ++i)
+               z->dequant[t][stbi__jpeg_dezigzag[i]] = (stbi__uint16)(sixteen ? stbi__get16be(z->s) : stbi__get8(z->s));
+            L -= (sixteen ? 129 : 65);
+         }
+         return L==0;
+
+      case 0xC4: // DHT - define huffman table
+         L = stbi__get16be(z->s)-2;
+         while (L > 0) {
+            stbi_uc *v;
+            int sizes[16],i,n=0;
+            int q = stbi__get8(z->s);
+            int tc = q >> 4;
+            int th = q & 15;
+            if (tc > 1 || th > 3) return stbi__err("bad DHT header","Corrupt JPEG");
+            for (i=0; i < 16; ++i) {
+               sizes[i] = stbi__get8(z->s);
+               n += sizes[i];
+            }
+            L -= 17;
+            if (tc == 0) {
+               if (!stbi__build_huffman(z->huff_dc+th, sizes)) return 0;
+               v = z->huff_dc[th].values;
+            } else {
+               if (!stbi__build_huffman(z->huff_ac+th, sizes)) return 0;
+               v = z->huff_ac[th].values;
+            }
+            for (i=0; i < n; ++i)
+               v[i] = stbi__get8(z->s);
+            if (tc != 0)
+               stbi__build_fast_ac(z->fast_ac[th], z->huff_ac + th);
+            L -= n;
+         }
+         return L==0;
+   }
+
+   // check for comment block or APP blocks
+   if ((m >= 0xE0 && m <= 0xEF) || m == 0xFE) {
+      L = stbi__get16be(z->s);
+      if (L < 2) {
+         if (m == 0xFE)
+            return stbi__err("bad COM len","Corrupt JPEG");
+         else
+            return stbi__err("bad APP len","Corrupt JPEG");
+      }
+      L -= 2;
+
+      if (m == 0xE0 && L >= 5) { // JFIF APP0 segment
+         static const unsigned char tag[5] = {'J','F','I','F','\0'};
+         int ok = 1;
+         int i;
+         for (i=0; i < 5; ++i)
+            if (stbi__get8(z->s) != tag[i])
+               ok = 0;
+         L -= 5;
+         if (ok)
+            z->jfif = 1;
+      } else if (m == 0xEE && L >= 12) { // Adobe APP14 segment
+         static const unsigned char tag[6] = {'A','d','o','b','e','\0'};
+         int ok = 1;
+         int i;
+         for (i=0; i < 6; ++i)
+            if (stbi__get8(z->s) != tag[i])
+               ok = 0;
+         L -= 6;
+         if (ok) {
+            stbi__get8(z->s); // version
+            stbi__get16be(z->s); // flags0
+            stbi__get16be(z->s); // flags1
+            z->app14_color_transform = stbi__get8(z->s); // color transform
+            L -= 6;
+         }
+      }
+
+      stbi__skip(z->s, L);
+      return 1;
+   }
+
+   return stbi__err("unknown marker","Corrupt JPEG");
+}
+
+// after we see SOS
+static int stbi__process_scan_header(stbi__jpeg *z)
+{
+   int i;
+   int Ls = stbi__get16be(z->s);
+   z->scan_n = stbi__get8(z->s);
+   if (z->scan_n < 1 || z->scan_n > 4 || z->scan_n > (int) z->s->img_n) return stbi__err("bad SOS component count","Corrupt JPEG");
+   if (Ls != 6+2*z->scan_n) return stbi__err("bad SOS len","Corrupt JPEG");
+   for (i=0; i < z->scan_n; ++i) {
+      int id = stbi__get8(z->s), which;
+      int q = stbi__get8(z->s);
+      for (which = 0; which < z->s->img_n; ++which)
+         if (z->img_comp[which].id == id)
+            break;
+      if (which == z->s->img_n) return 0; // no match
+      z->img_comp[which].hd = q >> 4;   if (z->img_comp[which].hd > 3) return stbi__err("bad DC huff","Corrupt JPEG");
+      z->img_comp[which].ha = q & 15;   if (z->img_comp[which].ha > 3) return stbi__err("bad AC huff","Corrupt JPEG");
+      z->order[i] = which;
+   }
+
+   {
+      int aa;
+      z->spec_start = stbi__get8(z->s);
+      z->spec_end   = stbi__get8(z->s); // should be 63, but might be 0
+      aa = stbi__get8(z->s);
+      z->succ_high = (aa >> 4);
+      z->succ_low  = (aa & 15);
+      if (z->progressive) {
+         if (z->spec_start > 63 || z->spec_end > 63  || z->spec_start > z->spec_end || z->succ_high > 13 || z->succ_low > 13)
+            return stbi__err("bad SOS", "Corrupt JPEG");
+      } else {
+         if (z->spec_start != 0) return stbi__err("bad SOS","Corrupt JPEG");
+         if (z->succ_high != 0 || z->succ_low != 0) return stbi__err("bad SOS","Corrupt JPEG");
+         z->spec_end = 63;
+      }
+   }
+
+   return 1;
+}
+
+static int stbi__free_jpeg_components(stbi__jpeg *z, int ncomp, int why)
+{
+   int i;
+   for (i=0; i < ncomp; ++i) {
+      if (z->img_comp[i].raw_data) {
+         STBI_FREE(z->img_comp[i].raw_data);
+         z->img_comp[i].raw_data = NULL;
+         z->img_comp[i].data = NULL;
+      }
+      if (z->img_comp[i].raw_coeff) {
+         STBI_FREE(z->img_comp[i].raw_coeff);
+         z->img_comp[i].raw_coeff = 0;
+         z->img_comp[i].coeff = 0;
+      }
+      if (z->img_comp[i].linebuf) {
+         STBI_FREE(z->img_comp[i].linebuf);
+         z->img_comp[i].linebuf = NULL;
+      }
+   }
+   return why;
+}
+
+static int stbi__process_frame_header(stbi__jpeg *z, int scan)
+{
+   stbi__context *s = z->s;
+   int Lf,p,i,q, h_max=1,v_max=1,c;
+   Lf = stbi__get16be(s);         if (Lf < 11) return stbi__err("bad SOF len","Corrupt JPEG"); // JPEG
+   p  = stbi__get8(s);            if (p != 8) return stbi__err("only 8-bit","JPEG format not supported: 8-bit only"); // JPEG baseline
+   s->img_y = stbi__get16be(s);   if (s->img_y == 0) return stbi__err("no header height", "JPEG format not supported: delayed height"); // Legal, but we don't handle it--but neither does IJG
+   s->img_x = stbi__get16be(s);   if (s->img_x == 0) return stbi__err("0 width","Corrupt JPEG"); // JPEG requires
+   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+   c = stbi__get8(s);
+   if (c != 3 && c != 1 && c != 4) return stbi__err("bad component count","Corrupt JPEG");
+   s->img_n = c;
+   for (i=0; i < c; ++i) {
+      z->img_comp[i].data = NULL;
+      z->img_comp[i].linebuf = NULL;
+   }
+
+   if (Lf != 8+3*s->img_n) return stbi__err("bad SOF len","Corrupt JPEG");
+
+   z->rgb = 0;
+   for (i=0; i < s->img_n; ++i) {
+      static const unsigned char rgb[3] = { 'R', 'G', 'B' };
+      z->img_comp[i].id = stbi__get8(s);
+      if (s->img_n == 3 && z->img_comp[i].id == rgb[i])
+         ++z->rgb;
+      q = stbi__get8(s);
+      z->img_comp[i].h = (q >> 4);  if (!z->img_comp[i].h || z->img_comp[i].h > 4) return stbi__err("bad H","Corrupt JPEG");
+      z->img_comp[i].v = q & 15;    if (!z->img_comp[i].v || z->img_comp[i].v > 4) return stbi__err("bad V","Corrupt JPEG");
+      z->img_comp[i].tq = stbi__get8(s);  if (z->img_comp[i].tq > 3) return stbi__err("bad TQ","Corrupt JPEG");
+   }
+
+   if (scan != STBI__SCAN_load) return 1;
+
+   if (!stbi__mad3sizes_valid(s->img_x, s->img_y, s->img_n, 0)) return stbi__err("too large", "Image too large to decode");
+
+   for (i=0; i < s->img_n; ++i) {
+      if (z->img_comp[i].h > h_max) h_max = z->img_comp[i].h;
+      if (z->img_comp[i].v > v_max) v_max = z->img_comp[i].v;
+   }
+
+   // check that plane subsampling factors are integer ratios; our resamplers can't deal with fractional ratios
+   // and I've never seen a non-corrupted JPEG file actually use them
+   for (i=0; i < s->img_n; ++i) {
+      if (h_max % z->img_comp[i].h != 0) return stbi__err("bad H","Corrupt JPEG");
+      if (v_max % z->img_comp[i].v != 0) return stbi__err("bad V","Corrupt JPEG");
+   }
+
+   // compute interleaved mcu info
+   z->img_h_max = h_max;
+   z->img_v_max = v_max;
+   z->img_mcu_w = h_max * 8;
+   z->img_mcu_h = v_max * 8;
+   // these sizes can't be more than 17 bits
+   z->img_mcu_x = (s->img_x + z->img_mcu_w-1) / z->img_mcu_w;
+   z->img_mcu_y = (s->img_y + z->img_mcu_h-1) / z->img_mcu_h;
+
+   for (i=0; i < s->img_n; ++i) {
+      // number of effective pixels (e.g. for non-interleaved MCU)
+      z->img_comp[i].x = (s->img_x * z->img_comp[i].h + h_max-1) / h_max;
+      z->img_comp[i].y = (s->img_y * z->img_comp[i].v + v_max-1) / v_max;
+      // to simplify generation, we'll allocate enough memory to decode
+      // the bogus oversized data from using interleaved MCUs and their
+      // big blocks (e.g. a 16x16 iMCU on an image of width 33); we won't
+      // discard the extra data until colorspace conversion
+      //
+      // img_mcu_x, img_mcu_y: <=17 bits; comp[i].h and .v are <=4 (checked earlier)
+      // so these muls can't overflow with 32-bit ints (which we require)
+      z->img_comp[i].w2 = z->img_mcu_x * z->img_comp[i].h * 8;
+      z->img_comp[i].h2 = z->img_mcu_y * z->img_comp[i].v * 8;
+      z->img_comp[i].coeff = 0;
+      z->img_comp[i].raw_coeff = 0;
+      z->img_comp[i].linebuf = NULL;
+      z->img_comp[i].raw_data = stbi__malloc_mad2(z->img_comp[i].w2, z->img_comp[i].h2, 15);
+      if (z->img_comp[i].raw_data == NULL)
+         return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
+      // align blocks for idct using mmx/sse
+      z->img_comp[i].data = (stbi_uc*) (((size_t) z->img_comp[i].raw_data + 15) & ~15);
+      if (z->progressive) {
+         // w2, h2 are multiples of 8 (see above)
+         z->img_comp[i].coeff_w = z->img_comp[i].w2 / 8;
+         z->img_comp[i].coeff_h = z->img_comp[i].h2 / 8;
+         z->img_comp[i].raw_coeff = stbi__malloc_mad3(z->img_comp[i].w2, z->img_comp[i].h2, sizeof(short), 15);
+         if (z->img_comp[i].raw_coeff == NULL)
+            return stbi__free_jpeg_components(z, i+1, stbi__err("outofmem", "Out of memory"));
+         z->img_comp[i].coeff = (short*) (((size_t) z->img_comp[i].raw_coeff + 15) & ~15);
+      }
+   }
+
+   return 1;
+}
+
+// use comparisons since in some cases we handle more than one case (e.g. SOF)
+#define stbi__DNL(x)         ((x) == 0xdc)
+#define stbi__SOI(x)         ((x) == 0xd8)
+#define stbi__EOI(x)         ((x) == 0xd9)
+#define stbi__SOF(x)         ((x) == 0xc0 || (x) == 0xc1 || (x) == 0xc2)
+#define stbi__SOS(x)         ((x) == 0xda)
+
+#define stbi__SOF_progressive(x)   ((x) == 0xc2)
+
+static int stbi__decode_jpeg_header(stbi__jpeg *z, int scan)
+{
+   int m;
+   z->jfif = 0;
+   z->app14_color_transform = -1; // valid values are 0,1,2
+   z->marker = STBI__MARKER_none; // initialize cached marker to empty
+   m = stbi__get_marker(z);
+   if (!stbi__SOI(m)) return stbi__err("no SOI","Corrupt JPEG");
+   if (scan == STBI__SCAN_type) return 1;
+   m = stbi__get_marker(z);
+   while (!stbi__SOF(m)) {
+      if (!stbi__process_marker(z,m)) return 0;
+      m = stbi__get_marker(z);
+      while (m == STBI__MARKER_none) {
+         // some files have extra padding after their blocks, so ok, we'll scan
+         if (stbi__at_eof(z->s)) return stbi__err("no SOF", "Corrupt JPEG");
+         m = stbi__get_marker(z);
+      }
+   }
+   z->progressive = stbi__SOF_progressive(m);
+   if (!stbi__process_frame_header(z, scan)) return 0;
+   return 1;
+}
+
+// decode image to YCbCr format
+static int stbi__decode_jpeg_image(stbi__jpeg *j)
+{
+   int m;
+   for (m = 0; m < 4; m++) {
+      j->img_comp[m].raw_data = NULL;
+      j->img_comp[m].raw_coeff = NULL;
+   }
+   j->restart_interval = 0;
+   if (!stbi__decode_jpeg_header(j, STBI__SCAN_load)) return 0;
+   m = stbi__get_marker(j);
+   while (!stbi__EOI(m)) {
+      if (stbi__SOS(m)) {
+         if (!stbi__process_scan_header(j)) return 0;
+         if (!stbi__parse_entropy_coded_data(j)) return 0;
+         if (j->marker == STBI__MARKER_none ) {
+            // handle 0s at the end of image data from IP Kamera 9060
+            while (!stbi__at_eof(j->s)) {
+               int x = stbi__get8(j->s);
+               if (x == 255) {
+                  j->marker = stbi__get8(j->s);
+                  break;
+               }
+            }
+            // if we reach eof without hitting a marker, stbi__get_marker() below will fail and we'll eventually return 0
+         }
+      } else if (stbi__DNL(m)) {
+         int Ld = stbi__get16be(j->s);
+         stbi__uint32 NL = stbi__get16be(j->s);
+         if (Ld != 4) return stbi__err("bad DNL len", "Corrupt JPEG");
+         if (NL != j->s->img_y) return stbi__err("bad DNL height", "Corrupt JPEG");
+      } else {
+         if (!stbi__process_marker(j, m)) return 0;
+      }
+      m = stbi__get_marker(j);
+   }
+   if (j->progressive)
+      stbi__jpeg_finish(j);
+   return 1;
+}
+
+// static jfif-centered resampling (across block boundaries)
+
+typedef stbi_uc *(*resample_row_func)(stbi_uc *out, stbi_uc *in0, stbi_uc *in1,
+                                    int w, int hs);
+
+#define stbi__div4(x) ((stbi_uc) ((x) >> 2))
+
+static stbi_uc *resample_row_1(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   STBI_NOTUSED(out);
+   STBI_NOTUSED(in_far);
+   STBI_NOTUSED(w);
+   STBI_NOTUSED(hs);
+   return in_near;
+}
+
+static stbi_uc* stbi__resample_row_v_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate two samples vertically for every one in input
+   int i;
+   STBI_NOTUSED(hs);
+   for (i=0; i < w; ++i)
+      out[i] = stbi__div4(3*in_near[i] + in_far[i] + 2);
+   return out;
+}
+
+static stbi_uc*  stbi__resample_row_h_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate two samples horizontally for every one in input
+   int i;
+   stbi_uc *input = in_near;
+
+   if (w == 1) {
+      // if only one sample, can't do any interpolation
+      out[0] = out[1] = input[0];
+      return out;
+   }
+
+   out[0] = input[0];
+   out[1] = stbi__div4(input[0]*3 + input[1] + 2);
+   for (i=1; i < w-1; ++i) {
+      int n = 3*input[i]+2;
+      out[i*2+0] = stbi__div4(n+input[i-1]);
+      out[i*2+1] = stbi__div4(n+input[i+1]);
+   }
+   out[i*2+0] = stbi__div4(input[w-2]*3 + input[w-1] + 2);
+   out[i*2+1] = input[w-1];
+
+   STBI_NOTUSED(in_far);
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+
+#define stbi__div16(x) ((stbi_uc) ((x) >> 4))
+
+static stbi_uc *stbi__resample_row_hv_2(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate 2x2 samples for every one in input
+   int i,t0,t1;
+   if (w == 1) {
+      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
+      return out;
+   }
+
+   t1 = 3*in_near[0] + in_far[0];
+   out[0] = stbi__div4(t1+2);
+   for (i=1; i < w; ++i) {
+      t0 = t1;
+      t1 = 3*in_near[i]+in_far[i];
+      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
+      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
+   }
+   out[w*2-1] = stbi__div4(t1+2);
+
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+
+#if defined(STBI_SSE2) || defined(STBI_NEON)
+static stbi_uc *stbi__resample_row_hv_2_simd(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // need to generate 2x2 samples for every one in input
+   int i=0,t0,t1;
+
+   if (w == 1) {
+      out[0] = out[1] = stbi__div4(3*in_near[0] + in_far[0] + 2);
+      return out;
+   }
+
+   t1 = 3*in_near[0] + in_far[0];
+   // process groups of 8 pixels for as long as we can.
+   // note we can't handle the last pixel in a row in this loop
+   // because we need to handle the filter boundary conditions.
+   for (; i < ((w-1) & ~7); i += 8) {
+#if defined(STBI_SSE2)
+      // load and perform the vertical filtering pass
+      // this uses 3*x + y = 4*x + (y - x)
+      __m128i zero  = _mm_setzero_si128();
+      __m128i farb  = _mm_loadl_epi64((__m128i *) (in_far + i));
+      __m128i nearb = _mm_loadl_epi64((__m128i *) (in_near + i));
+      __m128i farw  = _mm_unpacklo_epi8(farb, zero);
+      __m128i nearw = _mm_unpacklo_epi8(nearb, zero);
+      __m128i diff  = _mm_sub_epi16(farw, nearw);
+      __m128i nears = _mm_slli_epi16(nearw, 2);
+      __m128i curr  = _mm_add_epi16(nears, diff); // current row
+
+      // horizontal filter works the same based on shifted vers of current
+      // row. "prev" is current row shifted right by 1 pixel; we need to
+      // insert the previous pixel value (from t1).
+      // "next" is current row shifted left by 1 pixel, with first pixel
+      // of next block of 8 pixels added in.
+      __m128i prv0 = _mm_slli_si128(curr, 2);
+      __m128i nxt0 = _mm_srli_si128(curr, 2);
+      __m128i prev = _mm_insert_epi16(prv0, t1, 0);
+      __m128i next = _mm_insert_epi16(nxt0, 3*in_near[i+8] + in_far[i+8], 7);
+
+      // horizontal filter, polyphase implementation since it's convenient:
+      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
+      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
+      // note the shared term.
+      __m128i bias  = _mm_set1_epi16(8);
+      __m128i curs = _mm_slli_epi16(curr, 2);
+      __m128i prvd = _mm_sub_epi16(prev, curr);
+      __m128i nxtd = _mm_sub_epi16(next, curr);
+      __m128i curb = _mm_add_epi16(curs, bias);
+      __m128i even = _mm_add_epi16(prvd, curb);
+      __m128i odd  = _mm_add_epi16(nxtd, curb);
+
+      // interleave even and odd pixels, then undo scaling.
+      __m128i int0 = _mm_unpacklo_epi16(even, odd);
+      __m128i int1 = _mm_unpackhi_epi16(even, odd);
+      __m128i de0  = _mm_srli_epi16(int0, 4);
+      __m128i de1  = _mm_srli_epi16(int1, 4);
+
+      // pack and write output
+      __m128i outv = _mm_packus_epi16(de0, de1);
+      _mm_storeu_si128((__m128i *) (out + i*2), outv);
+#elif defined(STBI_NEON)
+      // load and perform the vertical filtering pass
+      // this uses 3*x + y = 4*x + (y - x)
+      uint8x8_t farb  = vld1_u8(in_far + i);
+      uint8x8_t nearb = vld1_u8(in_near + i);
+      int16x8_t diff  = vreinterpretq_s16_u16(vsubl_u8(farb, nearb));
+      int16x8_t nears = vreinterpretq_s16_u16(vshll_n_u8(nearb, 2));
+      int16x8_t curr  = vaddq_s16(nears, diff); // current row
+
+      // horizontal filter works the same based on shifted vers of current
+      // row. "prev" is current row shifted right by 1 pixel; we need to
+      // insert the previous pixel value (from t1).
+      // "next" is current row shifted left by 1 pixel, with first pixel
+      // of next block of 8 pixels added in.
+      int16x8_t prv0 = vextq_s16(curr, curr, 7);
+      int16x8_t nxt0 = vextq_s16(curr, curr, 1);
+      int16x8_t prev = vsetq_lane_s16(t1, prv0, 0);
+      int16x8_t next = vsetq_lane_s16(3*in_near[i+8] + in_far[i+8], nxt0, 7);
+
+      // horizontal filter, polyphase implementation since it's convenient:
+      // even pixels = 3*cur + prev = cur*4 + (prev - cur)
+      // odd  pixels = 3*cur + next = cur*4 + (next - cur)
+      // note the shared term.
+      int16x8_t curs = vshlq_n_s16(curr, 2);
+      int16x8_t prvd = vsubq_s16(prev, curr);
+      int16x8_t nxtd = vsubq_s16(next, curr);
+      int16x8_t even = vaddq_s16(curs, prvd);
+      int16x8_t odd  = vaddq_s16(curs, nxtd);
+
+      // undo scaling and round, then store with even/odd phases interleaved
+      uint8x8x2_t o;
+      o.val[0] = vqrshrun_n_s16(even, 4);
+      o.val[1] = vqrshrun_n_s16(odd,  4);
+      vst2_u8(out + i*2, o);
+#endif
+
+      // "previous" value for next iter
+      t1 = 3*in_near[i+7] + in_far[i+7];
+   }
+
+   t0 = t1;
+   t1 = 3*in_near[i] + in_far[i];
+   out[i*2] = stbi__div16(3*t1 + t0 + 8);
+
+   for (++i; i < w; ++i) {
+      t0 = t1;
+      t1 = 3*in_near[i]+in_far[i];
+      out[i*2-1] = stbi__div16(3*t0 + t1 + 8);
+      out[i*2  ] = stbi__div16(3*t1 + t0 + 8);
+   }
+   out[w*2-1] = stbi__div4(t1+2);
+
+   STBI_NOTUSED(hs);
+
+   return out;
+}
+#endif
+
+static stbi_uc *stbi__resample_row_generic(stbi_uc *out, stbi_uc *in_near, stbi_uc *in_far, int w, int hs)
+{
+   // resample with nearest-neighbor
+   int i,j;
+   STBI_NOTUSED(in_far);
+   for (i=0; i < w; ++i)
+      for (j=0; j < hs; ++j)
+         out[i*hs+j] = in_near[i];
+   return out;
+}
+
+// this is a reduced-precision calculation of YCbCr-to-RGB introduced
+// to make sure the code produces the same results in both SIMD and scalar
+#define stbi__float2fixed(x)  (((int) ((x) * 4096.0f + 0.5f)) << 8)
+static void stbi__YCbCr_to_RGB_row(stbi_uc *out, const stbi_uc *y, const stbi_uc *pcb, const stbi_uc *pcr, int count, int step)
+{
+   int i;
+   for (i=0; i < count; ++i) {
+      int y_fixed = (y[i] << 20) + (1<<19); // rounding
+      int r,g,b;
+      int cr = pcr[i] - 128;
+      int cb = pcb[i] - 128;
+      r = y_fixed +  cr* stbi__float2fixed(1.40200f);
+      g = y_fixed + (cr*-stbi__float2fixed(0.71414f)) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
+      b = y_fixed                                     +   cb* stbi__float2fixed(1.77200f);
+      r >>= 20;
+      g >>= 20;
+      b >>= 20;
+      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
+      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
+      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
+      out[0] = (stbi_uc)r;
+      out[1] = (stbi_uc)g;
+      out[2] = (stbi_uc)b;
+      out[3] = 255;
+      out += step;
+   }
+}
+
+#if defined(STBI_SSE2) || defined(STBI_NEON)
+static void stbi__YCbCr_to_RGB_simd(stbi_uc *out, stbi_uc const *y, stbi_uc const *pcb, stbi_uc const *pcr, int count, int step)
+{
+   int i = 0;
+
+#ifdef STBI_SSE2
+   // step == 3 is pretty ugly on the final interleave, and i'm not convinced
+   // it's useful in practice (you wouldn't use it for textures, for example).
+   // so just accelerate step == 4 case.
+   if (step == 4) {
+      // this is a fairly straightforward implementation and not super-optimized.
+      __m128i signflip  = _mm_set1_epi8(-0x80);
+      __m128i cr_const0 = _mm_set1_epi16(   (short) ( 1.40200f*4096.0f+0.5f));
+      __m128i cr_const1 = _mm_set1_epi16( - (short) ( 0.71414f*4096.0f+0.5f));
+      __m128i cb_const0 = _mm_set1_epi16( - (short) ( 0.34414f*4096.0f+0.5f));
+      __m128i cb_const1 = _mm_set1_epi16(   (short) ( 1.77200f*4096.0f+0.5f));
+      __m128i y_bias = _mm_set1_epi8((char) (unsigned char) 128);
+      __m128i xw = _mm_set1_epi16(255); // alpha channel
+
+      for (; i+7 < count; i += 8) {
+         // load
+         __m128i y_bytes = _mm_loadl_epi64((__m128i *) (y+i));
+         __m128i cr_bytes = _mm_loadl_epi64((__m128i *) (pcr+i));
+         __m128i cb_bytes = _mm_loadl_epi64((__m128i *) (pcb+i));
+         __m128i cr_biased = _mm_xor_si128(cr_bytes, signflip); // -128
+         __m128i cb_biased = _mm_xor_si128(cb_bytes, signflip); // -128
+
+         // unpack to short (and left-shift cr, cb by 8)
+         __m128i yw  = _mm_unpacklo_epi8(y_bias, y_bytes);
+         __m128i crw = _mm_unpacklo_epi8(_mm_setzero_si128(), cr_biased);
+         __m128i cbw = _mm_unpacklo_epi8(_mm_setzero_si128(), cb_biased);
+
+         // color transform
+         __m128i yws = _mm_srli_epi16(yw, 4);
+         __m128i cr0 = _mm_mulhi_epi16(cr_const0, crw);
+         __m128i cb0 = _mm_mulhi_epi16(cb_const0, cbw);
+         __m128i cb1 = _mm_mulhi_epi16(cbw, cb_const1);
+         __m128i cr1 = _mm_mulhi_epi16(crw, cr_const1);
+         __m128i rws = _mm_add_epi16(cr0, yws);
+         __m128i gwt = _mm_add_epi16(cb0, yws);
+         __m128i bws = _mm_add_epi16(yws, cb1);
+         __m128i gws = _mm_add_epi16(gwt, cr1);
+
+         // descale
+         __m128i rw = _mm_srai_epi16(rws, 4);
+         __m128i bw = _mm_srai_epi16(bws, 4);
+         __m128i gw = _mm_srai_epi16(gws, 4);
+
+         // back to byte, set up for transpose
+         __m128i brb = _mm_packus_epi16(rw, bw);
+         __m128i gxb = _mm_packus_epi16(gw, xw);
+
+         // transpose to interleave channels
+         __m128i t0 = _mm_unpacklo_epi8(brb, gxb);
+         __m128i t1 = _mm_unpackhi_epi8(brb, gxb);
+         __m128i o0 = _mm_unpacklo_epi16(t0, t1);
+         __m128i o1 = _mm_unpackhi_epi16(t0, t1);
+
+         // store
+         _mm_storeu_si128((__m128i *) (out + 0), o0);
+         _mm_storeu_si128((__m128i *) (out + 16), o1);
+         out += 32;
+      }
+   }
+#endif
+
+#ifdef STBI_NEON
+   // in this version, step=3 support would be easy to add. but is there demand?
+   if (step == 4) {
+      // this is a fairly straightforward implementation and not super-optimized.
+      uint8x8_t signflip = vdup_n_u8(0x80);
+      int16x8_t cr_const0 = vdupq_n_s16(   (short) ( 1.40200f*4096.0f+0.5f));
+      int16x8_t cr_const1 = vdupq_n_s16( - (short) ( 0.71414f*4096.0f+0.5f));
+      int16x8_t cb_const0 = vdupq_n_s16( - (short) ( 0.34414f*4096.0f+0.5f));
+      int16x8_t cb_const1 = vdupq_n_s16(   (short) ( 1.77200f*4096.0f+0.5f));
+
+      for (; i+7 < count; i += 8) {
+         // load
+         uint8x8_t y_bytes  = vld1_u8(y + i);
+         uint8x8_t cr_bytes = vld1_u8(pcr + i);
+         uint8x8_t cb_bytes = vld1_u8(pcb + i);
+         int8x8_t cr_biased = vreinterpret_s8_u8(vsub_u8(cr_bytes, signflip));
+         int8x8_t cb_biased = vreinterpret_s8_u8(vsub_u8(cb_bytes, signflip));
+
+         // expand to s16
+         int16x8_t yws = vreinterpretq_s16_u16(vshll_n_u8(y_bytes, 4));
+         int16x8_t crw = vshll_n_s8(cr_biased, 7);
+         int16x8_t cbw = vshll_n_s8(cb_biased, 7);
+
+         // color transform
+         int16x8_t cr0 = vqdmulhq_s16(crw, cr_const0);
+         int16x8_t cb0 = vqdmulhq_s16(cbw, cb_const0);
+         int16x8_t cr1 = vqdmulhq_s16(crw, cr_const1);
+         int16x8_t cb1 = vqdmulhq_s16(cbw, cb_const1);
+         int16x8_t rws = vaddq_s16(yws, cr0);
+         int16x8_t gws = vaddq_s16(vaddq_s16(yws, cb0), cr1);
+         int16x8_t bws = vaddq_s16(yws, cb1);
+
+         // undo scaling, round, convert to byte
+         uint8x8x4_t o;
+         o.val[0] = vqrshrun_n_s16(rws, 4);
+         o.val[1] = vqrshrun_n_s16(gws, 4);
+         o.val[2] = vqrshrun_n_s16(bws, 4);
+         o.val[3] = vdup_n_u8(255);
+
+         // store, interleaving r/g/b/a
+         vst4_u8(out, o);
+         out += 8*4;
+      }
+   }
+#endif
+
+   for (; i < count; ++i) {
+      int y_fixed = (y[i] << 20) + (1<<19); // rounding
+      int r,g,b;
+      int cr = pcr[i] - 128;
+      int cb = pcb[i] - 128;
+      r = y_fixed + cr* stbi__float2fixed(1.40200f);
+      g = y_fixed + cr*-stbi__float2fixed(0.71414f) + ((cb*-stbi__float2fixed(0.34414f)) & 0xffff0000);
+      b = y_fixed                                   +   cb* stbi__float2fixed(1.77200f);
+      r >>= 20;
+      g >>= 20;
+      b >>= 20;
+      if ((unsigned) r > 255) { if (r < 0) r = 0; else r = 255; }
+      if ((unsigned) g > 255) { if (g < 0) g = 0; else g = 255; }
+      if ((unsigned) b > 255) { if (b < 0) b = 0; else b = 255; }
+      out[0] = (stbi_uc)r;
+      out[1] = (stbi_uc)g;
+      out[2] = (stbi_uc)b;
+      out[3] = 255;
+      out += step;
+   }
+}
+#endif
+
+// set up the kernels
+static void stbi__setup_jpeg(stbi__jpeg *j)
+{
+   j->idct_block_kernel = stbi__idct_block;
+   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_row;
+   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2;
+
+#ifdef STBI_SSE2
+   if (stbi__sse2_available()) {
+      j->idct_block_kernel = stbi__idct_simd;
+      j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
+      j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
+   }
+#endif
+
+#ifdef STBI_NEON
+   j->idct_block_kernel = stbi__idct_simd;
+   j->YCbCr_to_RGB_kernel = stbi__YCbCr_to_RGB_simd;
+   j->resample_row_hv_2_kernel = stbi__resample_row_hv_2_simd;
+#endif
+}
+
+// clean up the temporary component buffers
+static void stbi__cleanup_jpeg(stbi__jpeg *j)
+{
+   stbi__free_jpeg_components(j, j->s->img_n, 0);
+}
+
+typedef struct
+{
+   resample_row_func resample;
+   stbi_uc *line0,*line1;
+   int hs,vs;   // expansion factor in each axis
+   int w_lores; // horizontal pixels pre-expansion
+   int ystep;   // how far through vertical expansion we are
+   int ypos;    // which pre-expansion row we're on
+} stbi__resample;
+
+// fast 0..255 * 0..255 => 0..255 rounded multiplication
+static stbi_uc stbi__blinn_8x8(stbi_uc x, stbi_uc y)
+{
+   unsigned int t = x*y + 128;
+   return (stbi_uc) ((t + (t >>8)) >> 8);
+}
+
+static stbi_uc *load_jpeg_image(stbi__jpeg *z, int *out_x, int *out_y, int *comp, int req_comp)
+{
+   int n, decode_n, is_rgb;
+   z->s->img_n = 0; // make stbi__cleanup_jpeg safe
+
+   // validate req_comp
+   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
+
+   // load a jpeg image from whichever source, but leave in YCbCr format
+   if (!stbi__decode_jpeg_image(z)) { stbi__cleanup_jpeg(z); return NULL; }
+
+   // determine actual number of components to generate
+   n = req_comp ? req_comp : z->s->img_n >= 3 ? 3 : 1;
+
+   is_rgb = z->s->img_n == 3 && (z->rgb == 3 || (z->app14_color_transform == 0 && !z->jfif));
+
+   if (z->s->img_n == 3 && n < 3 && !is_rgb)
+      decode_n = 1;
+   else
+      decode_n = z->s->img_n;
+
+   // nothing to do if no components requested; check this now to avoid
+   // accessing uninitialized coutput[0] later
+   if (decode_n <= 0) { stbi__cleanup_jpeg(z); return NULL; }
+
+   // resample and color-convert
+   {
+      int k;
+      unsigned int i,j;
+      stbi_uc *output;
+      stbi_uc *coutput[4] = { NULL, NULL, NULL, NULL };
+
+      stbi__resample res_comp[4];
+
+      for (k=0; k < decode_n; ++k) {
+         stbi__resample *r = &res_comp[k];
+
+         // allocate line buffer big enough for upsampling off the edges
+         // with upsample factor of 4
+         z->img_comp[k].linebuf = (stbi_uc *) stbi__malloc(z->s->img_x + 3);
+         if (!z->img_comp[k].linebuf) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
+
+         r->hs      = z->img_h_max / z->img_comp[k].h;
+         r->vs      = z->img_v_max / z->img_comp[k].v;
+         r->ystep   = r->vs >> 1;
+         r->w_lores = (z->s->img_x + r->hs-1) / r->hs;
+         r->ypos    = 0;
+         r->line0   = r->line1 = z->img_comp[k].data;
+
+         if      (r->hs == 1 && r->vs == 1) r->resample = resample_row_1;
+         else if (r->hs == 1 && r->vs == 2) r->resample = stbi__resample_row_v_2;
+         else if (r->hs == 2 && r->vs == 1) r->resample = stbi__resample_row_h_2;
+         else if (r->hs == 2 && r->vs == 2) r->resample = z->resample_row_hv_2_kernel;
+         else                               r->resample = stbi__resample_row_generic;
+      }
+
+      // can't error after this so, this is safe
+      output = (stbi_uc *) stbi__malloc_mad3(n, z->s->img_x, z->s->img_y, 1);
+      if (!output) { stbi__cleanup_jpeg(z); return stbi__errpuc("outofmem", "Out of memory"); }
+
+      // now go ahead and resample
+      for (j=0; j < z->s->img_y; ++j) {
+         stbi_uc *out = output + n * z->s->img_x * j;
+         for (k=0; k < decode_n; ++k) {
+            stbi__resample *r = &res_comp[k];
+            int y_bot = r->ystep >= (r->vs >> 1);
+            coutput[k] = r->resample(z->img_comp[k].linebuf,
+                                     y_bot ? r->line1 : r->line0,
+                                     y_bot ? r->line0 : r->line1,
+                                     r->w_lores, r->hs);
+            if (++r->ystep >= r->vs) {
+               r->ystep = 0;
+               r->line0 = r->line1;
+               if (++r->ypos < z->img_comp[k].y)
+                  r->line1 += z->img_comp[k].w2;
+            }
+         }
+         if (n >= 3) {
+            stbi_uc *y = coutput[0];
+            if (z->s->img_n == 3) {
+               if (is_rgb) {
+                  for (i=0; i < z->s->img_x; ++i) {
+                     out[0] = y[i];
+                     out[1] = coutput[1][i];
+                     out[2] = coutput[2][i];
+                     out[3] = 255;
+                     out += n;
+                  }
+               } else {
+                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
+               }
+            } else if (z->s->img_n == 4) {
+               if (z->app14_color_transform == 0) { // CMYK
+                  for (i=0; i < z->s->img_x; ++i) {
+                     stbi_uc m = coutput[3][i];
+                     out[0] = stbi__blinn_8x8(coutput[0][i], m);
+                     out[1] = stbi__blinn_8x8(coutput[1][i], m);
+                     out[2] = stbi__blinn_8x8(coutput[2][i], m);
+                     out[3] = 255;
+                     out += n;
+                  }
+               } else if (z->app14_color_transform == 2) { // YCCK
+                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
+                  for (i=0; i < z->s->img_x; ++i) {
+                     stbi_uc m = coutput[3][i];
+                     out[0] = stbi__blinn_8x8(255 - out[0], m);
+                     out[1] = stbi__blinn_8x8(255 - out[1], m);
+                     out[2] = stbi__blinn_8x8(255 - out[2], m);
+                     out += n;
+                  }
+               } else { // YCbCr + alpha?  Ignore the fourth channel for now
+                  z->YCbCr_to_RGB_kernel(out, y, coutput[1], coutput[2], z->s->img_x, n);
+               }
+            } else
+               for (i=0; i < z->s->img_x; ++i) {
+                  out[0] = out[1] = out[2] = y[i];
+                  out[3] = 255; // not used if n==3
+                  out += n;
+               }
+         } else {
+            if (is_rgb) {
+               if (n == 1)
+                  for (i=0; i < z->s->img_x; ++i)
+                     *out++ = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
+               else {
+                  for (i=0; i < z->s->img_x; ++i, out += 2) {
+                     out[0] = stbi__compute_y(coutput[0][i], coutput[1][i], coutput[2][i]);
+                     out[1] = 255;
+                  }
+               }
+            } else if (z->s->img_n == 4 && z->app14_color_transform == 0) {
+               for (i=0; i < z->s->img_x; ++i) {
+                  stbi_uc m = coutput[3][i];
+                  stbi_uc r = stbi__blinn_8x8(coutput[0][i], m);
+                  stbi_uc g = stbi__blinn_8x8(coutput[1][i], m);
+                  stbi_uc b = stbi__blinn_8x8(coutput[2][i], m);
+                  out[0] = stbi__compute_y(r, g, b);
+                  out[1] = 255;
+                  out += n;
+               }
+            } else if (z->s->img_n == 4 && z->app14_color_transform == 2) {
+               for (i=0; i < z->s->img_x; ++i) {
+                  out[0] = stbi__blinn_8x8(255 - coutput[0][i], coutput[3][i]);
+                  out[1] = 255;
+                  out += n;
+               }
+            } else {
+               stbi_uc *y = coutput[0];
+               if (n == 1)
+                  for (i=0; i < z->s->img_x; ++i) out[i] = y[i];
+               else
+                  for (i=0; i < z->s->img_x; ++i) { *out++ = y[i]; *out++ = 255; }
+            }
+         }
+      }
+      stbi__cleanup_jpeg(z);
+      *out_x = z->s->img_x;
+      *out_y = z->s->img_y;
+      if (comp) *comp = z->s->img_n >= 3 ? 3 : 1; // report original components, not output
+      return output;
+   }
+}
+
+static void *stbi__jpeg_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   unsigned char* result;
+   stbi__jpeg* j = (stbi__jpeg*) stbi__malloc(sizeof(stbi__jpeg));
+   if (!j) return stbi__errpuc("outofmem", "Out of memory");
+   STBI_NOTUSED(ri);
+   j->s = s;
+   stbi__setup_jpeg(j);
+   result = load_jpeg_image(j, x,y,comp,req_comp);
+   STBI_FREE(j);
+   return result;
+}
+
+static int stbi__jpeg_test(stbi__context *s)
+{
+   int r;
+   stbi__jpeg* j = (stbi__jpeg*)stbi__malloc(sizeof(stbi__jpeg));
+   if (!j) return stbi__err("outofmem", "Out of memory");
+   j->s = s;
+   stbi__setup_jpeg(j);
+   r = stbi__decode_jpeg_header(j, STBI__SCAN_type);
+   stbi__rewind(s);
+   STBI_FREE(j);
+   return r;
+}
+
+static int stbi__jpeg_info_raw(stbi__jpeg *j, int *x, int *y, int *comp)
+{
+   if (!stbi__decode_jpeg_header(j, STBI__SCAN_header)) {
+      stbi__rewind( j->s );
+      return 0;
+   }
+   if (x) *x = j->s->img_x;
+   if (y) *y = j->s->img_y;
+   if (comp) *comp = j->s->img_n >= 3 ? 3 : 1;
+   return 1;
+}
+
+static int stbi__jpeg_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int result;
+   stbi__jpeg* j = (stbi__jpeg*) (stbi__malloc(sizeof(stbi__jpeg)));
+   if (!j) return stbi__err("outofmem", "Out of memory");
+   j->s = s;
+   result = stbi__jpeg_info_raw(j, x, y, comp);
+   STBI_FREE(j);
+   return result;
+}
+#endif
+
+// public domain zlib decode    v0.2  Sean Barrett 2006-11-18
+//    simple implementation
+//      - all input must be provided in an upfront buffer
+//      - all output is written to a single output buffer (can malloc/realloc)
+//    performance
+//      - fast huffman
+
+#ifndef STBI_NO_ZLIB
+
+// fast-way is faster to check than jpeg huffman, but slow way is slower
+#define STBI__ZFAST_BITS  9 // accelerate all cases in default tables
+#define STBI__ZFAST_MASK  ((1 << STBI__ZFAST_BITS) - 1)
+#define STBI__ZNSYMS 288 // number of symbols in literal/length alphabet
+
+// zlib-style huffman encoding
+// (jpegs packs from left, zlib from right, so can't share code)
+typedef struct
+{
+   stbi__uint16 fast[1 << STBI__ZFAST_BITS];
+   stbi__uint16 firstcode[16];
+   int maxcode[17];
+   stbi__uint16 firstsymbol[16];
+   stbi_uc  size[STBI__ZNSYMS];
+   stbi__uint16 value[STBI__ZNSYMS];
+} stbi__zhuffman;
+
+stbi_inline static int stbi__bitreverse16(int n)
+{
+  n = ((n & 0xAAAA) >>  1) | ((n & 0x5555) << 1);
+  n = ((n & 0xCCCC) >>  2) | ((n & 0x3333) << 2);
+  n = ((n & 0xF0F0) >>  4) | ((n & 0x0F0F) << 4);
+  n = ((n & 0xFF00) >>  8) | ((n & 0x00FF) << 8);
+  return n;
+}
+
+stbi_inline static int stbi__bit_reverse(int v, int bits)
+{
+   STBI_ASSERT(bits <= 16);
+   // to bit reverse n bits, reverse 16 and shift
+   // e.g. 11 bits, bit reverse and shift away 5
+   return stbi__bitreverse16(v) >> (16-bits);
+}
+
+static int stbi__zbuild_huffman(stbi__zhuffman *z, const stbi_uc *sizelist, int num)
+{
+   int i,k=0;
+   int code, next_code[16], sizes[17];
+
+   // DEFLATE spec for generating codes
+   memset(sizes, 0, sizeof(sizes));
+   memset(z->fast, 0, sizeof(z->fast));
+   for (i=0; i < num; ++i)
+      ++sizes[sizelist[i]];
+   sizes[0] = 0;
+   for (i=1; i < 16; ++i)
+      if (sizes[i] > (1 << i))
+         return stbi__err("bad sizes", "Corrupt PNG");
+   code = 0;
+   for (i=1; i < 16; ++i) {
+      next_code[i] = code;
+      z->firstcode[i] = (stbi__uint16) code;
+      z->firstsymbol[i] = (stbi__uint16) k;
+      code = (code + sizes[i]);
+      if (sizes[i])
+         if (code-1 >= (1 << i)) return stbi__err("bad codelengths","Corrupt PNG");
+      z->maxcode[i] = code << (16-i); // preshift for inner loop
+      code <<= 1;
+      k += sizes[i];
+   }
+   z->maxcode[16] = 0x10000; // sentinel
+   for (i=0; i < num; ++i) {
+      int s = sizelist[i];
+      if (s) {
+         int c = next_code[s] - z->firstcode[s] + z->firstsymbol[s];
+         stbi__uint16 fastv = (stbi__uint16) ((s << 9) | i);
+         z->size [c] = (stbi_uc     ) s;
+         z->value[c] = (stbi__uint16) i;
+         if (s <= STBI__ZFAST_BITS) {
+            int j = stbi__bit_reverse(next_code[s],s);
+            while (j < (1 << STBI__ZFAST_BITS)) {
+               z->fast[j] = fastv;
+               j += (1 << s);
+            }
+         }
+         ++next_code[s];
+      }
+   }
+   return 1;
+}
+
+// zlib-from-memory implementation for PNG reading
+//    because PNG allows splitting the zlib stream arbitrarily,
+//    and it's annoying structurally to have PNG call ZLIB call PNG,
+//    we require PNG read all the IDATs and combine them into a single
+//    memory buffer
+
+typedef struct
+{
+   stbi_uc *zbuffer, *zbuffer_end;
+   int num_bits;
+   stbi__uint32 code_buffer;
+
+   char *zout;
+   char *zout_start;
+   char *zout_end;
+   int   z_expandable;
+
+   stbi__zhuffman z_length, z_distance;
+} stbi__zbuf;
+
+stbi_inline static int stbi__zeof(stbi__zbuf *z)
+{
+   return (z->zbuffer >= z->zbuffer_end);
+}
+
+stbi_inline static stbi_uc stbi__zget8(stbi__zbuf *z)
+{
+   return stbi__zeof(z) ? 0 : *z->zbuffer++;
+}
+
+static void stbi__fill_bits(stbi__zbuf *z)
+{
+   do {
+      if (z->code_buffer >= (1U << z->num_bits)) {
+        z->zbuffer = z->zbuffer_end;  /* treat this as EOF so we fail. */
+        return;
+      }
+      z->code_buffer |= (unsigned int) stbi__zget8(z) << z->num_bits;
+      z->num_bits += 8;
+   } while (z->num_bits <= 24);
+}
+
+stbi_inline static unsigned int stbi__zreceive(stbi__zbuf *z, int n)
+{
+   unsigned int k;
+   if (z->num_bits < n) stbi__fill_bits(z);
+   k = z->code_buffer & ((1 << n) - 1);
+   z->code_buffer >>= n;
+   z->num_bits -= n;
+   return k;
+}
+
+static int stbi__zhuffman_decode_slowpath(stbi__zbuf *a, stbi__zhuffman *z)
+{
+   int b,s,k;
+   // not resolved by fast table, so compute it the slow way
+   // use jpeg approach, which requires MSbits at top
+   k = stbi__bit_reverse(a->code_buffer, 16);
+   for (s=STBI__ZFAST_BITS+1; ; ++s)
+      if (k < z->maxcode[s])
+         break;
+   if (s >= 16) return -1; // invalid code!
+   // code size is s, so:
+   b = (k >> (16-s)) - z->firstcode[s] + z->firstsymbol[s];
+   if (b >= STBI__ZNSYMS) return -1; // some data was corrupt somewhere!
+   if (z->size[b] != s) return -1;  // was originally an assert, but report failure instead.
+   a->code_buffer >>= s;
+   a->num_bits -= s;
+   return z->value[b];
+}
+
+stbi_inline static int stbi__zhuffman_decode(stbi__zbuf *a, stbi__zhuffman *z)
+{
+   int b,s;
+   if (a->num_bits < 16) {
+      if (stbi__zeof(a)) {
+         return -1;   /* report error for unexpected end of data. */
+      }
+      stbi__fill_bits(a);
+   }
+   b = z->fast[a->code_buffer & STBI__ZFAST_MASK];
+   if (b) {
+      s = b >> 9;
+      a->code_buffer >>= s;
+      a->num_bits -= s;
+      return b & 511;
+   }
+   return stbi__zhuffman_decode_slowpath(a, z);
+}
+
+static int stbi__zexpand(stbi__zbuf *z, char *zout, int n)  // need to make room for n bytes
+{
+   char *q;
+   unsigned int cur, limit, old_limit;
+   z->zout = zout;
+   if (!z->z_expandable) return stbi__err("output buffer limit","Corrupt PNG");
+   cur   = (unsigned int) (z->zout - z->zout_start);
+   limit = old_limit = (unsigned) (z->zout_end - z->zout_start);
+   if (UINT_MAX - cur < (unsigned) n) return stbi__err("outofmem", "Out of memory");
+   while (cur + n > limit) {
+      if(limit > UINT_MAX / 2) return stbi__err("outofmem", "Out of memory");
+      limit *= 2;
+   }
+   q = (char *) STBI_REALLOC_SIZED(z->zout_start, old_limit, limit);
+   STBI_NOTUSED(old_limit);
+   if (q == NULL) return stbi__err("outofmem", "Out of memory");
+   z->zout_start = q;
+   z->zout       = q + cur;
+   z->zout_end   = q + limit;
+   return 1;
+}
+
+static const int stbi__zlength_base[31] = {
+   3,4,5,6,7,8,9,10,11,13,
+   15,17,19,23,27,31,35,43,51,59,
+   67,83,99,115,131,163,195,227,258,0,0 };
+
+static const int stbi__zlength_extra[31]=
+{ 0,0,0,0,0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3,4,4,4,4,5,5,5,5,0,0,0 };
+
+static const int stbi__zdist_base[32] = { 1,2,3,4,5,7,9,13,17,25,33,49,65,97,129,193,
+257,385,513,769,1025,1537,2049,3073,4097,6145,8193,12289,16385,24577,0,0};
+
+static const int stbi__zdist_extra[32] =
+{ 0,0,0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8,8,9,9,10,10,11,11,12,12,13,13};
+
+static int stbi__parse_huffman_block(stbi__zbuf *a)
+{
+   char *zout = a->zout;
+   for(;;) {
+      int z = stbi__zhuffman_decode(a, &a->z_length);
+      if (z < 256) {
+         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG"); // error in huffman codes
+         if (zout >= a->zout_end) {
+            if (!stbi__zexpand(a, zout, 1)) return 0;
+            zout = a->zout;
+         }
+         *zout++ = (char) z;
+      } else {
+         stbi_uc *p;
+         int len,dist;
+         if (z == 256) {
+            a->zout = zout;
+            return 1;
+         }
+         z -= 257;
+         len = stbi__zlength_base[z];
+         if (stbi__zlength_extra[z]) len += stbi__zreceive(a, stbi__zlength_extra[z]);
+         z = stbi__zhuffman_decode(a, &a->z_distance);
+         if (z < 0) return stbi__err("bad huffman code","Corrupt PNG");
+         dist = stbi__zdist_base[z];
+         if (stbi__zdist_extra[z]) dist += stbi__zreceive(a, stbi__zdist_extra[z]);
+         if (zout - a->zout_start < dist) return stbi__err("bad dist","Corrupt PNG");
+         if (zout + len > a->zout_end) {
+            if (!stbi__zexpand(a, zout, len)) return 0;
+            zout = a->zout;
+         }
+         p = (stbi_uc *) (zout - dist);
+         if (dist == 1) { // run of one byte; common in images.
+            stbi_uc v = *p;
+            if (len) { do *zout++ = v; while (--len); }
+         } else {
+            if (len) { do *zout++ = *p++; while (--len); }
+         }
+      }
+   }
+}
+
+static int stbi__compute_huffman_codes(stbi__zbuf *a)
+{
+   static const stbi_uc length_dezigzag[19] = { 16,17,18,0,8,7,9,6,10,5,11,4,12,3,13,2,14,1,15 };
+   stbi__zhuffman z_codelength;
+   stbi_uc lencodes[286+32+137];//padding for maximum single op
+   stbi_uc codelength_sizes[19];
+   int i,n;
+
+   int hlit  = stbi__zreceive(a,5) + 257;
+   int hdist = stbi__zreceive(a,5) + 1;
+   int hclen = stbi__zreceive(a,4) + 4;
+   int ntot  = hlit + hdist;
+
+   memset(codelength_sizes, 0, sizeof(codelength_sizes));
+   for (i=0; i < hclen; ++i) {
+      int s = stbi__zreceive(a,3);
+      codelength_sizes[length_dezigzag[i]] = (stbi_uc) s;
+   }
+   if (!stbi__zbuild_huffman(&z_codelength, codelength_sizes, 19)) return 0;
+
+   n = 0;
+   while (n < ntot) {
+      int c = stbi__zhuffman_decode(a, &z_codelength);
+      if (c < 0 || c >= 19) return stbi__err("bad codelengths", "Corrupt PNG");
+      if (c < 16)
+         lencodes[n++] = (stbi_uc) c;
+      else {
+         stbi_uc fill = 0;
+         if (c == 16) {
+            c = stbi__zreceive(a,2)+3;
+            if (n == 0) return stbi__err("bad codelengths", "Corrupt PNG");
+            fill = lencodes[n-1];
+         } else if (c == 17) {
+            c = stbi__zreceive(a,3)+3;
+         } else if (c == 18) {
+            c = stbi__zreceive(a,7)+11;
+         } else {
+            return stbi__err("bad codelengths", "Corrupt PNG");
+         }
+         if (ntot - n < c) return stbi__err("bad codelengths", "Corrupt PNG");
+         memset(lencodes+n, fill, c);
+         n += c;
+      }
+   }
+   if (n != ntot) return stbi__err("bad codelengths","Corrupt PNG");
+   if (!stbi__zbuild_huffman(&a->z_length, lencodes, hlit)) return 0;
+   if (!stbi__zbuild_huffman(&a->z_distance, lencodes+hlit, hdist)) return 0;
+   return 1;
+}
+
+static int stbi__parse_uncompressed_block(stbi__zbuf *a)
+{
+   stbi_uc header[4];
+   int len,nlen,k;
+   if (a->num_bits & 7)
+      stbi__zreceive(a, a->num_bits & 7); // discard
+   // drain the bit-packed data into header
+   k = 0;
+   while (a->num_bits > 0) {
+      header[k++] = (stbi_uc) (a->code_buffer & 255); // suppress MSVC run-time check
+      a->code_buffer >>= 8;
+      a->num_bits -= 8;
+   }
+   if (a->num_bits < 0) return stbi__err("zlib corrupt","Corrupt PNG");
+   // now fill header the normal way
+   while (k < 4)
+      header[k++] = stbi__zget8(a);
+   len  = header[1] * 256 + header[0];
+   nlen = header[3] * 256 + header[2];
+   if (nlen != (len ^ 0xffff)) return stbi__err("zlib corrupt","Corrupt PNG");
+   if (a->zbuffer + len > a->zbuffer_end) return stbi__err("read past buffer","Corrupt PNG");
+   if (a->zout + len > a->zout_end)
+      if (!stbi__zexpand(a, a->zout, len)) return 0;
+   memcpy(a->zout, a->zbuffer, len);
+   a->zbuffer += len;
+   a->zout += len;
+   return 1;
+}
+
+static int stbi__parse_zlib_header(stbi__zbuf *a)
+{
+   int cmf   = stbi__zget8(a);
+   int cm    = cmf & 15;
+   /* int cinfo = cmf >> 4; */
+   int flg   = stbi__zget8(a);
+   if (stbi__zeof(a)) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
+   if ((cmf*256+flg) % 31 != 0) return stbi__err("bad zlib header","Corrupt PNG"); // zlib spec
+   if (flg & 32) return stbi__err("no preset dict","Corrupt PNG"); // preset dictionary not allowed in png
+   if (cm != 8) return stbi__err("bad compression","Corrupt PNG"); // DEFLATE required for png
+   // window = 1 << (8 + cinfo)... but who cares, we fully buffer output
+   return 1;
+}
+
+static const stbi_uc stbi__zdefault_length[STBI__ZNSYMS] =
+{
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,
+   8,8,8,8,8,8,8,8,8,8,8,8,8,8,8,8, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9, 9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,9,
+   7,7,7,7,7,7,7,7,7,7,7,7,7,7,7,7, 7,7,7,7,7,7,7,7,8,8,8,8,8,8,8,8
+};
+static const stbi_uc stbi__zdefault_distance[32] =
+{
+   5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5,5
+};
+/*
+Init algorithm:
+{
+   int i;   // use <= to match clearly with spec
+   for (i=0; i <= 143; ++i)     stbi__zdefault_length[i]   = 8;
+   for (   ; i <= 255; ++i)     stbi__zdefault_length[i]   = 9;
+   for (   ; i <= 279; ++i)     stbi__zdefault_length[i]   = 7;
+   for (   ; i <= 287; ++i)     stbi__zdefault_length[i]   = 8;
+
+   for (i=0; i <=  31; ++i)     stbi__zdefault_distance[i] = 5;
+}
+*/
+
+static int stbi__parse_zlib(stbi__zbuf *a, int parse_header)
+{
+   int final, type;
+   if (parse_header)
+      if (!stbi__parse_zlib_header(a)) return 0;
+   a->num_bits = 0;
+   a->code_buffer = 0;
+   do {
+      final = stbi__zreceive(a,1);
+      type = stbi__zreceive(a,2);
+      if (type == 0) {
+         if (!stbi__parse_uncompressed_block(a)) return 0;
+      } else if (type == 3) {
+         return 0;
+      } else {
+         if (type == 1) {
+            // use fixed code lengths
+            if (!stbi__zbuild_huffman(&a->z_length  , stbi__zdefault_length  , STBI__ZNSYMS)) return 0;
+            if (!stbi__zbuild_huffman(&a->z_distance, stbi__zdefault_distance,  32)) return 0;
+         } else {
+            if (!stbi__compute_huffman_codes(a)) return 0;
+         }
+         if (!stbi__parse_huffman_block(a)) return 0;
+      }
+   } while (!final);
+   return 1;
+}
+
+static int stbi__do_zlib(stbi__zbuf *a, char *obuf, int olen, int exp, int parse_header)
+{
+   a->zout_start = obuf;
+   a->zout       = obuf;
+   a->zout_end   = obuf + olen;
+   a->z_expandable = exp;
+
+   return stbi__parse_zlib(a, parse_header);
+}
+
+STBIDEF char *stbi_zlib_decode_malloc_guesssize(const char *buffer, int len, int initial_size, int *outlen)
+{
+   stbi__zbuf a;
+   char *p = (char *) stbi__malloc(initial_size);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi_uc *) buffer;
+   a.zbuffer_end = (stbi_uc *) buffer + len;
+   if (stbi__do_zlib(&a, p, initial_size, 1, 1)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      STBI_FREE(a.zout_start);
+      return NULL;
+   }
+}
+
+STBIDEF char *stbi_zlib_decode_malloc(char const *buffer, int len, int *outlen)
+{
+   return stbi_zlib_decode_malloc_guesssize(buffer, len, 16384, outlen);
+}
+
+STBIDEF char *stbi_zlib_decode_malloc_guesssize_headerflag(const char *buffer, int len, int initial_size, int *outlen, int parse_header)
+{
+   stbi__zbuf a;
+   char *p = (char *) stbi__malloc(initial_size);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi_uc *) buffer;
+   a.zbuffer_end = (stbi_uc *) buffer + len;
+   if (stbi__do_zlib(&a, p, initial_size, 1, parse_header)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      STBI_FREE(a.zout_start);
+      return NULL;
+   }
+}
+
+STBIDEF int stbi_zlib_decode_buffer(char *obuffer, int olen, char const *ibuffer, int ilen)
+{
+   stbi__zbuf a;
+   a.zbuffer = (stbi_uc *) ibuffer;
+   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
+   if (stbi__do_zlib(&a, obuffer, olen, 0, 1))
+      return (int) (a.zout - a.zout_start);
+   else
+      return -1;
+}
+
+STBIDEF char *stbi_zlib_decode_noheader_malloc(char const *buffer, int len, int *outlen)
+{
+   stbi__zbuf a;
+   char *p = (char *) stbi__malloc(16384);
+   if (p == NULL) return NULL;
+   a.zbuffer = (stbi_uc *) buffer;
+   a.zbuffer_end = (stbi_uc *) buffer+len;
+   if (stbi__do_zlib(&a, p, 16384, 1, 0)) {
+      if (outlen) *outlen = (int) (a.zout - a.zout_start);
+      return a.zout_start;
+   } else {
+      STBI_FREE(a.zout_start);
+      return NULL;
+   }
+}
+
+STBIDEF int stbi_zlib_decode_noheader_buffer(char *obuffer, int olen, const char *ibuffer, int ilen)
+{
+   stbi__zbuf a;
+   a.zbuffer = (stbi_uc *) ibuffer;
+   a.zbuffer_end = (stbi_uc *) ibuffer + ilen;
+   if (stbi__do_zlib(&a, obuffer, olen, 0, 0))
+      return (int) (a.zout - a.zout_start);
+   else
+      return -1;
+}
+#endif
+
+// public domain "baseline" PNG decoder   v0.10  Sean Barrett 2006-11-18
+//    simple implementation
+//      - only 8-bit samples
+//      - no CRC checking
+//      - allocates lots of intermediate memory
+//        - avoids problem of streaming data between subsystems
+//        - avoids explicit window management
+//    performance
+//      - uses stb_zlib, a PD zlib implementation with fast huffman decoding
+
+#ifndef STBI_NO_PNG
+typedef struct
+{
+   stbi__uint32 length;
+   stbi__uint32 type;
+} stbi__pngchunk;
+
+static stbi__pngchunk stbi__get_chunk_header(stbi__context *s)
+{
+   stbi__pngchunk c;
+   c.length = stbi__get32be(s);
+   c.type   = stbi__get32be(s);
+   return c;
+}
+
+static int stbi__check_png_header(stbi__context *s)
+{
+   static const stbi_uc png_sig[8] = { 137,80,78,71,13,10,26,10 };
+   int i;
+   for (i=0; i < 8; ++i)
+      if (stbi__get8(s) != png_sig[i]) return stbi__err("bad png sig","Not a PNG");
+   return 1;
+}
+
+typedef struct
+{
+   stbi__context *s;
+   stbi_uc *idata, *expanded, *out;
+   int depth;
+} stbi__png;
+
+
+enum {
+   STBI__F_none=0,
+   STBI__F_sub=1,
+   STBI__F_up=2,
+   STBI__F_avg=3,
+   STBI__F_paeth=4,
+   // synthetic filters used for first scanline to avoid needing a dummy row of 0s
+   STBI__F_avg_first,
+   STBI__F_paeth_first
+};
+
+static stbi_uc first_row_filter[5] =
+{
+   STBI__F_none,
+   STBI__F_sub,
+   STBI__F_none,
+   STBI__F_avg_first,
+   STBI__F_paeth_first
+};
+
+static int stbi__paeth(int a, int b, int c)
+{
+   int p = a + b - c;
+   int pa = abs(p-a);
+   int pb = abs(p-b);
+   int pc = abs(p-c);
+   if (pa <= pb && pa <= pc) return a;
+   if (pb <= pc) return b;
+   return c;
+}
+
+static const stbi_uc stbi__depth_scale_table[9] = { 0, 0xff, 0x55, 0, 0x11, 0,0,0, 0x01 };
+
+// create the png data from post-deflated data
+static int stbi__create_png_image_raw(stbi__png *a, stbi_uc *raw, stbi__uint32 raw_len, int out_n, stbi__uint32 x, stbi__uint32 y, int depth, int color)
+{
+   int bytes = (depth == 16? 2 : 1);
+   stbi__context *s = a->s;
+   stbi__uint32 i,j,stride = x*out_n*bytes;
+   stbi__uint32 img_len, img_width_bytes;
+   int k;
+   int img_n = s->img_n; // copy it into a local for later
+
+   int output_bytes = out_n*bytes;
+   int filter_bytes = img_n*bytes;
+   int width = x;
+
+   STBI_ASSERT(out_n == s->img_n || out_n == s->img_n+1);
+   a->out = (stbi_uc *) stbi__malloc_mad3(x, y, output_bytes, 0); // extra bytes to write off the end into
+   if (!a->out) return stbi__err("outofmem", "Out of memory");
+
+   if (!stbi__mad3sizes_valid(img_n, x, depth, 7)) return stbi__err("too large", "Corrupt PNG");
+   img_width_bytes = (((img_n * x * depth) + 7) >> 3);
+   img_len = (img_width_bytes + 1) * y;
+
+   // we used to check for exact match between raw_len and img_len on non-interlaced PNGs,
+   // but issue #276 reported a PNG in the wild that had extra data at the end (all zeros),
+   // so just check for raw_len < img_len always.
+   if (raw_len < img_len) return stbi__err("not enough pixels","Corrupt PNG");
+
+   for (j=0; j < y; ++j) {
+      stbi_uc *cur = a->out + stride*j;
+      stbi_uc *prior;
+      int filter = *raw++;
+
+      if (filter > 4)
+         return stbi__err("invalid filter","Corrupt PNG");
+
+      if (depth < 8) {
+         if (img_width_bytes > x) return stbi__err("invalid width","Corrupt PNG");
+         cur += x*out_n - img_width_bytes; // store output to the rightmost img_len bytes, so we can decode in place
+         filter_bytes = 1;
+         width = img_width_bytes;
+      }
+      prior = cur - stride; // bugfix: need to compute this after 'cur +=' computation above
+
+      // if first row, use special filter that doesn't sample previous row
+      if (j == 0) filter = first_row_filter[filter];
+
+      // handle first byte explicitly
+      for (k=0; k < filter_bytes; ++k) {
+         switch (filter) {
+            case STBI__F_none       : cur[k] = raw[k]; break;
+            case STBI__F_sub        : cur[k] = raw[k]; break;
+            case STBI__F_up         : cur[k] = STBI__BYTECAST(raw[k] + prior[k]); break;
+            case STBI__F_avg        : cur[k] = STBI__BYTECAST(raw[k] + (prior[k]>>1)); break;
+            case STBI__F_paeth      : cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(0,prior[k],0)); break;
+            case STBI__F_avg_first  : cur[k] = raw[k]; break;
+            case STBI__F_paeth_first: cur[k] = raw[k]; break;
+         }
+      }
+
+      if (depth == 8) {
+         if (img_n != out_n)
+            cur[img_n] = 255; // first pixel
+         raw += img_n;
+         cur += out_n;
+         prior += out_n;
+      } else if (depth == 16) {
+         if (img_n != out_n) {
+            cur[filter_bytes]   = 255; // first pixel top byte
+            cur[filter_bytes+1] = 255; // first pixel bottom byte
+         }
+         raw += filter_bytes;
+         cur += output_bytes;
+         prior += output_bytes;
+      } else {
+         raw += 1;
+         cur += 1;
+         prior += 1;
+      }
+
+      // this is a little gross, so that we don't switch per-pixel or per-component
+      if (depth < 8 || img_n == out_n) {
+         int nk = (width - 1)*filter_bytes;
+         #define STBI__CASE(f) \
+             case f:     \
+                for (k=0; k < nk; ++k)
+         switch (filter) {
+            // "none" filter turns into a memcpy here; make that explicit.
+            case STBI__F_none:         memcpy(cur, raw, nk); break;
+            STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k-filter_bytes]); } break;
+            STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
+            STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k-filter_bytes])>>1)); } break;
+            STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],prior[k],prior[k-filter_bytes])); } break;
+            STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k-filter_bytes] >> 1)); } break;
+            STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k-filter_bytes],0,0)); } break;
+         }
+         #undef STBI__CASE
+         raw += nk;
+      } else {
+         STBI_ASSERT(img_n+1 == out_n);
+         #define STBI__CASE(f) \
+             case f:     \
+                for (i=x-1; i >= 1; --i, cur[filter_bytes]=255,raw+=filter_bytes,cur+=output_bytes,prior+=output_bytes) \
+                   for (k=0; k < filter_bytes; ++k)
+         switch (filter) {
+            STBI__CASE(STBI__F_none)         { cur[k] = raw[k]; } break;
+            STBI__CASE(STBI__F_sub)          { cur[k] = STBI__BYTECAST(raw[k] + cur[k- output_bytes]); } break;
+            STBI__CASE(STBI__F_up)           { cur[k] = STBI__BYTECAST(raw[k] + prior[k]); } break;
+            STBI__CASE(STBI__F_avg)          { cur[k] = STBI__BYTECAST(raw[k] + ((prior[k] + cur[k- output_bytes])>>1)); } break;
+            STBI__CASE(STBI__F_paeth)        { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],prior[k],prior[k- output_bytes])); } break;
+            STBI__CASE(STBI__F_avg_first)    { cur[k] = STBI__BYTECAST(raw[k] + (cur[k- output_bytes] >> 1)); } break;
+            STBI__CASE(STBI__F_paeth_first)  { cur[k] = STBI__BYTECAST(raw[k] + stbi__paeth(cur[k- output_bytes],0,0)); } break;
+         }
+         #undef STBI__CASE
+
+         // the loop above sets the high byte of the pixels' alpha, but for
+         // 16 bit png files we also need the low byte set. we'll do that here.
+         if (depth == 16) {
+            cur = a->out + stride*j; // start at the beginning of the row again
+            for (i=0; i < x; ++i,cur+=output_bytes) {
+               cur[filter_bytes+1] = 255;
+            }
+         }
+      }
+   }
+
+   // we make a separate pass to expand bits to pixels; for performance,
+   // this could run two scanlines behind the above code, so it won't
+   // intefere with filtering but will still be in the cache.
+   if (depth < 8) {
+      for (j=0; j < y; ++j) {
+         stbi_uc *cur = a->out + stride*j;
+         stbi_uc *in  = a->out + stride*j + x*out_n - img_width_bytes;
+         // unpack 1/2/4-bit into a 8-bit buffer. allows us to keep the common 8-bit path optimal at minimal cost for 1/2/4-bit
+         // png guarante byte alignment, if width is not multiple of 8/4/2 we'll decode dummy trailing data that will be skipped in the later loop
+         stbi_uc scale = (color == 0) ? stbi__depth_scale_table[depth] : 1; // scale grayscale values to 0..255 range
+
+         // note that the final byte might overshoot and write more data than desired.
+         // we can allocate enough data that this never writes out of memory, but it
+         // could also overwrite the next scanline. can it overwrite non-empty data
+         // on the next scanline? yes, consider 1-pixel-wide scanlines with 1-bit-per-pixel.
+         // so we need to explicitly clamp the final ones
+
+         if (depth == 4) {
+            for (k=x*img_n; k >= 2; k-=2, ++in) {
+               *cur++ = scale * ((*in >> 4)       );
+               *cur++ = scale * ((*in     ) & 0x0f);
+            }
+            if (k > 0) *cur++ = scale * ((*in >> 4)       );
+         } else if (depth == 2) {
+            for (k=x*img_n; k >= 4; k-=4, ++in) {
+               *cur++ = scale * ((*in >> 6)       );
+               *cur++ = scale * ((*in >> 4) & 0x03);
+               *cur++ = scale * ((*in >> 2) & 0x03);
+               *cur++ = scale * ((*in     ) & 0x03);
+            }
+            if (k > 0) *cur++ = scale * ((*in >> 6)       );
+            if (k > 1) *cur++ = scale * ((*in >> 4) & 0x03);
+            if (k > 2) *cur++ = scale * ((*in >> 2) & 0x03);
+         } else if (depth == 1) {
+            for (k=x*img_n; k >= 8; k-=8, ++in) {
+               *cur++ = scale * ((*in >> 7)       );
+               *cur++ = scale * ((*in >> 6) & 0x01);
+               *cur++ = scale * ((*in >> 5) & 0x01);
+               *cur++ = scale * ((*in >> 4) & 0x01);
+               *cur++ = scale * ((*in >> 3) & 0x01);
+               *cur++ = scale * ((*in >> 2) & 0x01);
+               *cur++ = scale * ((*in >> 1) & 0x01);
+               *cur++ = scale * ((*in     ) & 0x01);
+            }
+            if (k > 0) *cur++ = scale * ((*in >> 7)       );
+            if (k > 1) *cur++ = scale * ((*in >> 6) & 0x01);
+            if (k > 2) *cur++ = scale * ((*in >> 5) & 0x01);
+            if (k > 3) *cur++ = scale * ((*in >> 4) & 0x01);
+            if (k > 4) *cur++ = scale * ((*in >> 3) & 0x01);
+            if (k > 5) *cur++ = scale * ((*in >> 2) & 0x01);
+            if (k > 6) *cur++ = scale * ((*in >> 1) & 0x01);
+         }
+         if (img_n != out_n) {
+            int q;
+            // insert alpha = 255
+            cur = a->out + stride*j;
+            if (img_n == 1) {
+               for (q=x-1; q >= 0; --q) {
+                  cur[q*2+1] = 255;
+                  cur[q*2+0] = cur[q];
+               }
+            } else {
+               STBI_ASSERT(img_n == 3);
+               for (q=x-1; q >= 0; --q) {
+                  cur[q*4+3] = 255;
+                  cur[q*4+2] = cur[q*3+2];
+                  cur[q*4+1] = cur[q*3+1];
+                  cur[q*4+0] = cur[q*3+0];
+               }
+            }
+         }
+      }
+   } else if (depth == 16) {
+      // force the image data from big-endian to platform-native.
+      // this is done in a separate pass due to the decoding relying
+      // on the data being untouched, but could probably be done
+      // per-line during decode if care is taken.
+      stbi_uc *cur = a->out;
+      stbi__uint16 *cur16 = (stbi__uint16*)cur;
+
+      for(i=0; i < x*y*out_n; ++i,cur16++,cur+=2) {
+         *cur16 = (cur[0] << 8) | cur[1];
+      }
+   }
+
+   return 1;
+}
+
+static int stbi__create_png_image(stbi__png *a, stbi_uc *image_data, stbi__uint32 image_data_len, int out_n, int depth, int color, int interlaced)
+{
+   int bytes = (depth == 16 ? 2 : 1);
+   int out_bytes = out_n * bytes;
+   stbi_uc *final;
+   int p;
+   if (!interlaced)
+      return stbi__create_png_image_raw(a, image_data, image_data_len, out_n, a->s->img_x, a->s->img_y, depth, color);
+
+   // de-interlacing
+   final = (stbi_uc *) stbi__malloc_mad3(a->s->img_x, a->s->img_y, out_bytes, 0);
+   if (!final) return stbi__err("outofmem", "Out of memory");
+   for (p=0; p < 7; ++p) {
+      int xorig[] = { 0,4,0,2,0,1,0 };
+      int yorig[] = { 0,0,4,0,2,0,1 };
+      int xspc[]  = { 8,8,4,4,2,2,1 };
+      int yspc[]  = { 8,8,8,4,4,2,2 };
+      int i,j,x,y;
+      // pass1_x[4] = 0, pass1_x[5] = 1, pass1_x[12] = 1
+      x = (a->s->img_x - xorig[p] + xspc[p]-1) / xspc[p];
+      y = (a->s->img_y - yorig[p] + yspc[p]-1) / yspc[p];
+      if (x && y) {
+         stbi__uint32 img_len = ((((a->s->img_n * x * depth) + 7) >> 3) + 1) * y;
+         if (!stbi__create_png_image_raw(a, image_data, image_data_len, out_n, x, y, depth, color)) {
+            STBI_FREE(final);
+            return 0;
+         }
+         for (j=0; j < y; ++j) {
+            for (i=0; i < x; ++i) {
+               int out_y = j*yspc[p]+yorig[p];
+               int out_x = i*xspc[p]+xorig[p];
+               memcpy(final + out_y*a->s->img_x*out_bytes + out_x*out_bytes,
+                      a->out + (j*x+i)*out_bytes, out_bytes);
+            }
+         }
+         STBI_FREE(a->out);
+         image_data += img_len;
+         image_data_len -= img_len;
+      }
+   }
+   a->out = final;
+
+   return 1;
+}
+
+static int stbi__compute_transparency(stbi__png *z, stbi_uc tc[3], int out_n)
+{
+   stbi__context *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi_uc *p = z->out;
+
+   // compute color-based transparency, assuming we've
+   // already got 255 as the alpha value in the output
+   STBI_ASSERT(out_n == 2 || out_n == 4);
+
+   if (out_n == 2) {
+      for (i=0; i < pixel_count; ++i) {
+         p[1] = (p[0] == tc[0] ? 0 : 255);
+         p += 2;
+      }
+   } else {
+      for (i=0; i < pixel_count; ++i) {
+         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
+            p[3] = 0;
+         p += 4;
+      }
+   }
+   return 1;
+}
+
+static int stbi__compute_transparency16(stbi__png *z, stbi__uint16 tc[3], int out_n)
+{
+   stbi__context *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi__uint16 *p = (stbi__uint16*) z->out;
+
+   // compute color-based transparency, assuming we've
+   // already got 65535 as the alpha value in the output
+   STBI_ASSERT(out_n == 2 || out_n == 4);
+
+   if (out_n == 2) {
+      for (i = 0; i < pixel_count; ++i) {
+         p[1] = (p[0] == tc[0] ? 0 : 65535);
+         p += 2;
+      }
+   } else {
+      for (i = 0; i < pixel_count; ++i) {
+         if (p[0] == tc[0] && p[1] == tc[1] && p[2] == tc[2])
+            p[3] = 0;
+         p += 4;
+      }
+   }
+   return 1;
+}
+
+static int stbi__expand_png_palette(stbi__png *a, stbi_uc *palette, int len, int pal_img_n)
+{
+   stbi__uint32 i, pixel_count = a->s->img_x * a->s->img_y;
+   stbi_uc *p, *temp_out, *orig = a->out;
+
+   p = (stbi_uc *) stbi__malloc_mad2(pixel_count, pal_img_n, 0);
+   if (p == NULL) return stbi__err("outofmem", "Out of memory");
+
+   // between here and free(out) below, exitting would leak
+   temp_out = p;
+
+   if (pal_img_n == 3) {
+      for (i=0; i < pixel_count; ++i) {
+         int n = orig[i]*4;
+         p[0] = palette[n  ];
+         p[1] = palette[n+1];
+         p[2] = palette[n+2];
+         p += 3;
+      }
+   } else {
+      for (i=0; i < pixel_count; ++i) {
+         int n = orig[i]*4;
+         p[0] = palette[n  ];
+         p[1] = palette[n+1];
+         p[2] = palette[n+2];
+         p[3] = palette[n+3];
+         p += 4;
+      }
+   }
+   STBI_FREE(a->out);
+   a->out = temp_out;
+
+   STBI_NOTUSED(len);
+
+   return 1;
+}
+
+static int stbi__unpremultiply_on_load_global = 0;
+static int stbi__de_iphone_flag_global = 0;
+
+STBIDEF void stbi_set_unpremultiply_on_load(int flag_true_if_should_unpremultiply)
+{
+   stbi__unpremultiply_on_load_global = flag_true_if_should_unpremultiply;
+}
+
+STBIDEF void stbi_convert_iphone_png_to_rgb(int flag_true_if_should_convert)
+{
+   stbi__de_iphone_flag_global = flag_true_if_should_convert;
+}
+
+#ifndef STBI_THREAD_LOCAL
+#define stbi__unpremultiply_on_load  stbi__unpremultiply_on_load_global
+#define stbi__de_iphone_flag  stbi__de_iphone_flag_global
+#else
+static STBI_THREAD_LOCAL int stbi__unpremultiply_on_load_local, stbi__unpremultiply_on_load_set;
+static STBI_THREAD_LOCAL int stbi__de_iphone_flag_local, stbi__de_iphone_flag_set;
+
+STBIDEF void stbi__unpremultiply_on_load_thread(int flag_true_if_should_unpremultiply)
+{
+   stbi__unpremultiply_on_load_local = flag_true_if_should_unpremultiply;
+   stbi__unpremultiply_on_load_set = 1;
+}
+
+STBIDEF void stbi_convert_iphone_png_to_rgb_thread(int flag_true_if_should_convert)
+{
+   stbi__de_iphone_flag_local = flag_true_if_should_convert;
+   stbi__de_iphone_flag_set = 1;
+}
+
+#define stbi__unpremultiply_on_load  (stbi__unpremultiply_on_load_set           \
+                                       ? stbi__unpremultiply_on_load_local      \
+                                       : stbi__unpremultiply_on_load_global)
+#define stbi__de_iphone_flag  (stbi__de_iphone_flag_set                         \
+                                ? stbi__de_iphone_flag_local                    \
+                                : stbi__de_iphone_flag_global)
+#endif // STBI_THREAD_LOCAL
+
+static void stbi__de_iphone(stbi__png *z)
+{
+   stbi__context *s = z->s;
+   stbi__uint32 i, pixel_count = s->img_x * s->img_y;
+   stbi_uc *p = z->out;
+
+   if (s->img_out_n == 3) {  // convert bgr to rgb
+      for (i=0; i < pixel_count; ++i) {
+         stbi_uc t = p[0];
+         p[0] = p[2];
+         p[2] = t;
+         p += 3;
+      }
+   } else {
+      STBI_ASSERT(s->img_out_n == 4);
+      if (stbi__unpremultiply_on_load) {
+         // convert bgr to rgb and unpremultiply
+         for (i=0; i < pixel_count; ++i) {
+            stbi_uc a = p[3];
+            stbi_uc t = p[0];
+            if (a) {
+               stbi_uc half = a / 2;
+               p[0] = (p[2] * 255 + half) / a;
+               p[1] = (p[1] * 255 + half) / a;
+               p[2] = ( t   * 255 + half) / a;
+            } else {
+               p[0] = p[2];
+               p[2] = t;
+            }
+            p += 4;
+         }
+      } else {
+         // convert bgr to rgb
+         for (i=0; i < pixel_count; ++i) {
+            stbi_uc t = p[0];
+            p[0] = p[2];
+            p[2] = t;
+            p += 4;
+         }
+      }
+   }
+}
+
+#define STBI__PNG_TYPE(a,b,c,d)  (((unsigned) (a) << 24) + ((unsigned) (b) << 16) + ((unsigned) (c) << 8) + (unsigned) (d))
+
+static int stbi__parse_png_file(stbi__png *z, int scan, int req_comp)
+{
+   stbi_uc palette[1024], pal_img_n=0;
+   stbi_uc has_trans=0, tc[3]={0};
+   stbi__uint16 tc16[3];
+   stbi__uint32 ioff=0, idata_limit=0, i, pal_len=0;
+   int first=1,k,interlace=0, color=0, is_iphone=0;
+   stbi__context *s = z->s;
+
+   z->expanded = NULL;
+   z->idata = NULL;
+   z->out = NULL;
+
+   if (!stbi__check_png_header(s)) return 0;
+
+   if (scan == STBI__SCAN_type) return 1;
+
+   for (;;) {
+      stbi__pngchunk c = stbi__get_chunk_header(s);
+      switch (c.type) {
+         case STBI__PNG_TYPE('C','g','B','I'):
+            is_iphone = 1;
+            stbi__skip(s, c.length);
+            break;
+         case STBI__PNG_TYPE('I','H','D','R'): {
+            int comp,filter;
+            if (!first) return stbi__err("multiple IHDR","Corrupt PNG");
+            first = 0;
+            if (c.length != 13) return stbi__err("bad IHDR len","Corrupt PNG");
+            s->img_x = stbi__get32be(s);
+            s->img_y = stbi__get32be(s);
+            if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+            if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+            z->depth = stbi__get8(s);  if (z->depth != 1 && z->depth != 2 && z->depth != 4 && z->depth != 8 && z->depth != 16)  return stbi__err("1/2/4/8/16-bit only","PNG not supported: 1/2/4/8/16-bit only");
+            color = stbi__get8(s);  if (color > 6)         return stbi__err("bad ctype","Corrupt PNG");
+            if (color == 3 && z->depth == 16)                  return stbi__err("bad ctype","Corrupt PNG");
+            if (color == 3) pal_img_n = 3; else if (color & 1) return stbi__err("bad ctype","Corrupt PNG");
+            comp  = stbi__get8(s);  if (comp) return stbi__err("bad comp method","Corrupt PNG");
+            filter= stbi__get8(s);  if (filter) return stbi__err("bad filter method","Corrupt PNG");
+            interlace = stbi__get8(s); if (interlace>1) return stbi__err("bad interlace method","Corrupt PNG");
+            if (!s->img_x || !s->img_y) return stbi__err("0-pixel image","Corrupt PNG");
+            if (!pal_img_n) {
+               s->img_n = (color & 2 ? 3 : 1) + (color & 4 ? 1 : 0);
+               if ((1 << 30) / s->img_x / s->img_n < s->img_y) return stbi__err("too large", "Image too large to decode");
+               if (scan == STBI__SCAN_header) return 1;
+            } else {
+               // if paletted, then pal_n is our final components, and
+               // img_n is # components to decompress/filter.
+               s->img_n = 1;
+               if ((1 << 30) / s->img_x / 4 < s->img_y) return stbi__err("too large","Corrupt PNG");
+               // if SCAN_header, have to scan to see if we have a tRNS
+            }
+            break;
+         }
+
+         case STBI__PNG_TYPE('P','L','T','E'):  {
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (c.length > 256*3) return stbi__err("invalid PLTE","Corrupt PNG");
+            pal_len = c.length / 3;
+            if (pal_len * 3 != c.length) return stbi__err("invalid PLTE","Corrupt PNG");
+            for (i=0; i < pal_len; ++i) {
+               palette[i*4+0] = stbi__get8(s);
+               palette[i*4+1] = stbi__get8(s);
+               palette[i*4+2] = stbi__get8(s);
+               palette[i*4+3] = 255;
+            }
+            break;
+         }
+
+         case STBI__PNG_TYPE('t','R','N','S'): {
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (z->idata) return stbi__err("tRNS after IDAT","Corrupt PNG");
+            if (pal_img_n) {
+               if (scan == STBI__SCAN_header) { s->img_n = 4; return 1; }
+               if (pal_len == 0) return stbi__err("tRNS before PLTE","Corrupt PNG");
+               if (c.length > pal_len) return stbi__err("bad tRNS len","Corrupt PNG");
+               pal_img_n = 4;
+               for (i=0; i < c.length; ++i)
+                  palette[i*4+3] = stbi__get8(s);
+            } else {
+               if (!(s->img_n & 1)) return stbi__err("tRNS with alpha","Corrupt PNG");
+               if (c.length != (stbi__uint32) s->img_n*2) return stbi__err("bad tRNS len","Corrupt PNG");
+               has_trans = 1;
+               if (z->depth == 16) {
+                  for (k = 0; k < s->img_n; ++k) tc16[k] = (stbi__uint16)stbi__get16be(s); // copy the values as-is
+               } else {
+                  for (k = 0; k < s->img_n; ++k) tc[k] = (stbi_uc)(stbi__get16be(s) & 255) * stbi__depth_scale_table[z->depth]; // non 8-bit images will be larger
+               }
+            }
+            break;
+         }
+
+         case STBI__PNG_TYPE('I','D','A','T'): {
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (pal_img_n && !pal_len) return stbi__err("no PLTE","Corrupt PNG");
+            if (scan == STBI__SCAN_header) { s->img_n = pal_img_n; return 1; }
+            if ((int)(ioff + c.length) < (int)ioff) return 0;
+            if (ioff + c.length > idata_limit) {
+               stbi__uint32 idata_limit_old = idata_limit;
+               stbi_uc *p;
+               if (idata_limit == 0) idata_limit = c.length > 4096 ? c.length : 4096;
+               while (ioff + c.length > idata_limit)
+                  idata_limit *= 2;
+               STBI_NOTUSED(idata_limit_old);
+               p = (stbi_uc *) STBI_REALLOC_SIZED(z->idata, idata_limit_old, idata_limit); if (p == NULL) return stbi__err("outofmem", "Out of memory");
+               z->idata = p;
+            }
+            if (!stbi__getn(s, z->idata+ioff,c.length)) return stbi__err("outofdata","Corrupt PNG");
+            ioff += c.length;
+            break;
+         }
+
+         case STBI__PNG_TYPE('I','E','N','D'): {
+            stbi__uint32 raw_len, bpl;
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if (scan != STBI__SCAN_load) return 1;
+            if (z->idata == NULL) return stbi__err("no IDAT","Corrupt PNG");
+            // initial guess for decoded data size to avoid unnecessary reallocs
+            bpl = (s->img_x * z->depth + 7) / 8; // bytes per line, per component
+            raw_len = bpl * s->img_y * s->img_n /* pixels */ + s->img_y /* filter mode per row */;
+            z->expanded = (stbi_uc *) stbi_zlib_decode_malloc_guesssize_headerflag((char *) z->idata, ioff, raw_len, (int *) &raw_len, !is_iphone);
+            if (z->expanded == NULL) return 0; // zlib should set error
+            STBI_FREE(z->idata); z->idata = NULL;
+            if ((req_comp == s->img_n+1 && req_comp != 3 && !pal_img_n) || has_trans)
+               s->img_out_n = s->img_n+1;
+            else
+               s->img_out_n = s->img_n;
+            if (!stbi__create_png_image(z, z->expanded, raw_len, s->img_out_n, z->depth, color, interlace)) return 0;
+            if (has_trans) {
+               if (z->depth == 16) {
+                  if (!stbi__compute_transparency16(z, tc16, s->img_out_n)) return 0;
+               } else {
+                  if (!stbi__compute_transparency(z, tc, s->img_out_n)) return 0;
+               }
+            }
+            if (is_iphone && stbi__de_iphone_flag && s->img_out_n > 2)
+               stbi__de_iphone(z);
+            if (pal_img_n) {
+               // pal_img_n == 3 or 4
+               s->img_n = pal_img_n; // record the actual colors we had
+               s->img_out_n = pal_img_n;
+               if (req_comp >= 3) s->img_out_n = req_comp;
+               if (!stbi__expand_png_palette(z, palette, pal_len, s->img_out_n))
+                  return 0;
+            } else if (has_trans) {
+               // non-paletted image with tRNS -> source image has (constant) alpha
+               ++s->img_n;
+            }
+            STBI_FREE(z->expanded); z->expanded = NULL;
+            // end of PNG chunk, read and skip CRC
+            stbi__get32be(s);
+            return 1;
+         }
+
+         default:
+            // if critical, fail
+            if (first) return stbi__err("first not IHDR", "Corrupt PNG");
+            if ((c.type & (1 << 29)) == 0) {
+               #ifndef STBI_NO_FAILURE_STRINGS
+               // not threadsafe
+               static char invalid_chunk[] = "XXXX PNG chunk not known";
+               invalid_chunk[0] = STBI__BYTECAST(c.type >> 24);
+               invalid_chunk[1] = STBI__BYTECAST(c.type >> 16);
+               invalid_chunk[2] = STBI__BYTECAST(c.type >>  8);
+               invalid_chunk[3] = STBI__BYTECAST(c.type >>  0);
+               #endif
+               return stbi__err(invalid_chunk, "PNG not supported: unknown PNG chunk type");
+            }
+            stbi__skip(s, c.length);
+            break;
+      }
+      // end of PNG chunk, read and skip CRC
+      stbi__get32be(s);
+   }
+}
+
+static void *stbi__do_png(stbi__png *p, int *x, int *y, int *n, int req_comp, stbi__result_info *ri)
+{
+   void *result=NULL;
+   if (req_comp < 0 || req_comp > 4) return stbi__errpuc("bad req_comp", "Internal error");
+   if (stbi__parse_png_file(p, STBI__SCAN_load, req_comp)) {
+      if (p->depth <= 8)
+         ri->bits_per_channel = 8;
+      else if (p->depth == 16)
+         ri->bits_per_channel = 16;
+      else
+         return stbi__errpuc("bad bits_per_channel", "PNG not supported: unsupported color depth");
+      result = p->out;
+      p->out = NULL;
+      if (req_comp && req_comp != p->s->img_out_n) {
+         if (ri->bits_per_channel == 8)
+            result = stbi__convert_format((unsigned char *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
+         else
+            result = stbi__convert_format16((stbi__uint16 *) result, p->s->img_out_n, req_comp, p->s->img_x, p->s->img_y);
+         p->s->img_out_n = req_comp;
+         if (result == NULL) return result;
+      }
+      *x = p->s->img_x;
+      *y = p->s->img_y;
+      if (n) *n = p->s->img_n;
+   }
+   STBI_FREE(p->out);      p->out      = NULL;
+   STBI_FREE(p->expanded); p->expanded = NULL;
+   STBI_FREE(p->idata);    p->idata    = NULL;
+
+   return result;
+}
+
+static void *stbi__png_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi__png p;
+   p.s = s;
+   return stbi__do_png(&p, x,y,comp,req_comp, ri);
+}
+
+static int stbi__png_test(stbi__context *s)
+{
+   int r;
+   r = stbi__check_png_header(s);
+   stbi__rewind(s);
+   return r;
+}
+
+static int stbi__png_info_raw(stbi__png *p, int *x, int *y, int *comp)
+{
+   if (!stbi__parse_png_file(p, STBI__SCAN_header, 0)) {
+      stbi__rewind( p->s );
+      return 0;
+   }
+   if (x) *x = p->s->img_x;
+   if (y) *y = p->s->img_y;
+   if (comp) *comp = p->s->img_n;
+   return 1;
+}
+
+static int stbi__png_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   stbi__png p;
+   p.s = s;
+   return stbi__png_info_raw(&p, x, y, comp);
+}
+
+static int stbi__png_is16(stbi__context *s)
+{
+   stbi__png p;
+   p.s = s;
+   if (!stbi__png_info_raw(&p, NULL, NULL, NULL))
+	   return 0;
+   if (p.depth != 16) {
+      stbi__rewind(p.s);
+      return 0;
+   }
+   return 1;
+}
+#endif
+
+// Microsoft/Windows BMP image
+
+#ifndef STBI_NO_BMP
+static int stbi__bmp_test_raw(stbi__context *s)
+{
+   int r;
+   int sz;
+   if (stbi__get8(s) != 'B') return 0;
+   if (stbi__get8(s) != 'M') return 0;
+   stbi__get32le(s); // discard filesize
+   stbi__get16le(s); // discard reserved
+   stbi__get16le(s); // discard reserved
+   stbi__get32le(s); // discard data offset
+   sz = stbi__get32le(s);
+   r = (sz == 12 || sz == 40 || sz == 56 || sz == 108 || sz == 124);
+   return r;
+}
+
+static int stbi__bmp_test(stbi__context *s)
+{
+   int r = stbi__bmp_test_raw(s);
+   stbi__rewind(s);
+   return r;
+}
+
+
+// returns 0..31 for the highest set bit
+static int stbi__high_bit(unsigned int z)
+{
+   int n=0;
+   if (z == 0) return -1;
+   if (z >= 0x10000) { n += 16; z >>= 16; }
+   if (z >= 0x00100) { n +=  8; z >>=  8; }
+   if (z >= 0x00010) { n +=  4; z >>=  4; }
+   if (z >= 0x00004) { n +=  2; z >>=  2; }
+   if (z >= 0x00002) { n +=  1;/* >>=  1;*/ }
+   return n;
+}
+
+static int stbi__bitcount(unsigned int a)
+{
+   a = (a & 0x55555555) + ((a >>  1) & 0x55555555); // max 2
+   a = (a & 0x33333333) + ((a >>  2) & 0x33333333); // max 4
+   a = (a + (a >> 4)) & 0x0f0f0f0f; // max 8 per 4, now 8 bits
+   a = (a + (a >> 8)); // max 16 per 8 bits
+   a = (a + (a >> 16)); // max 32 per 8 bits
+   return a & 0xff;
+}
+
+// extract an arbitrarily-aligned N-bit value (N=bits)
+// from v, and then make it 8-bits long and fractionally
+// extend it to full full range.
+static int stbi__shiftsigned(unsigned int v, int shift, int bits)
+{
+   static unsigned int mul_table[9] = {
+      0,
+      0xff/*0b11111111*/, 0x55/*0b01010101*/, 0x49/*0b01001001*/, 0x11/*0b00010001*/,
+      0x21/*0b00100001*/, 0x41/*0b01000001*/, 0x81/*0b10000001*/, 0x01/*0b00000001*/,
+   };
+   static unsigned int shift_table[9] = {
+      0, 0,0,1,0,2,4,6,0,
+   };
+   if (shift < 0)
+      v <<= -shift;
+   else
+      v >>= shift;
+   STBI_ASSERT(v < 256);
+   v >>= (8-bits);
+   STBI_ASSERT(bits >= 0 && bits <= 8);
+   return (int) ((unsigned) v * mul_table[bits]) >> shift_table[bits];
+}
+
+typedef struct
+{
+   int bpp, offset, hsz;
+   unsigned int mr,mg,mb,ma, all_a;
+   int extra_read;
+} stbi__bmp_data;
+
+static int stbi__bmp_set_mask_defaults(stbi__bmp_data *info, int compress)
+{
+   // BI_BITFIELDS specifies masks explicitly, don't override
+   if (compress == 3)
+      return 1;
+
+   if (compress == 0) {
+      if (info->bpp == 16) {
+         info->mr = 31u << 10;
+         info->mg = 31u <<  5;
+         info->mb = 31u <<  0;
+      } else if (info->bpp == 32) {
+         info->mr = 0xffu << 16;
+         info->mg = 0xffu <<  8;
+         info->mb = 0xffu <<  0;
+         info->ma = 0xffu << 24;
+         info->all_a = 0; // if all_a is 0 at end, then we loaded alpha channel but it was all 0
+      } else {
+         // otherwise, use defaults, which is all-0
+         info->mr = info->mg = info->mb = info->ma = 0;
+      }
+      return 1;
+   }
+   return 0; // error
+}
+
+static void *stbi__bmp_parse_header(stbi__context *s, stbi__bmp_data *info)
+{
+   int hsz;
+   if (stbi__get8(s) != 'B' || stbi__get8(s) != 'M') return stbi__errpuc("not BMP", "Corrupt BMP");
+   stbi__get32le(s); // discard filesize
+   stbi__get16le(s); // discard reserved
+   stbi__get16le(s); // discard reserved
+   info->offset = stbi__get32le(s);
+   info->hsz = hsz = stbi__get32le(s);
+   info->mr = info->mg = info->mb = info->ma = 0;
+   info->extra_read = 14;
+
+   if (info->offset < 0) return stbi__errpuc("bad BMP", "bad BMP");
+
+   if (hsz != 12 && hsz != 40 && hsz != 56 && hsz != 108 && hsz != 124) return stbi__errpuc("unknown BMP", "BMP type not supported: unknown");
+   if (hsz == 12) {
+      s->img_x = stbi__get16le(s);
+      s->img_y = stbi__get16le(s);
+   } else {
+      s->img_x = stbi__get32le(s);
+      s->img_y = stbi__get32le(s);
+   }
+   if (stbi__get16le(s) != 1) return stbi__errpuc("bad BMP", "bad BMP");
+   info->bpp = stbi__get16le(s);
+   if (hsz != 12) {
+      int compress = stbi__get32le(s);
+      if (compress == 1 || compress == 2) return stbi__errpuc("BMP RLE", "BMP type not supported: RLE");
+      if (compress >= 4) return stbi__errpuc("BMP JPEG/PNG", "BMP type not supported: unsupported compression"); // this includes PNG/JPEG modes
+      if (compress == 3 && info->bpp != 16 && info->bpp != 32) return stbi__errpuc("bad BMP", "bad BMP"); // bitfields requires 16 or 32 bits/pixel
+      stbi__get32le(s); // discard sizeof
+      stbi__get32le(s); // discard hres
+      stbi__get32le(s); // discard vres
+      stbi__get32le(s); // discard colorsused
+      stbi__get32le(s); // discard max important
+      if (hsz == 40 || hsz == 56) {
+         if (hsz == 56) {
+            stbi__get32le(s);
+            stbi__get32le(s);
+            stbi__get32le(s);
+            stbi__get32le(s);
+         }
+         if (info->bpp == 16 || info->bpp == 32) {
+            if (compress == 0) {
+               stbi__bmp_set_mask_defaults(info, compress);
+            } else if (compress == 3) {
+               info->mr = stbi__get32le(s);
+               info->mg = stbi__get32le(s);
+               info->mb = stbi__get32le(s);
+               info->extra_read += 12;
+               // not documented, but generated by photoshop and handled by mspaint
+               if (info->mr == info->mg && info->mg == info->mb) {
+                  // ?!?!?
+                  return stbi__errpuc("bad BMP", "bad BMP");
+               }
+            } else
+               return stbi__errpuc("bad BMP", "bad BMP");
+         }
+      } else {
+         // V4/V5 header
+         int i;
+         if (hsz != 108 && hsz != 124)
+            return stbi__errpuc("bad BMP", "bad BMP");
+         info->mr = stbi__get32le(s);
+         info->mg = stbi__get32le(s);
+         info->mb = stbi__get32le(s);
+         info->ma = stbi__get32le(s);
+         if (compress != 3) // override mr/mg/mb unless in BI_BITFIELDS mode, as per docs
+            stbi__bmp_set_mask_defaults(info, compress);
+         stbi__get32le(s); // discard color space
+         for (i=0; i < 12; ++i)
+            stbi__get32le(s); // discard color space parameters
+         if (hsz == 124) {
+            stbi__get32le(s); // discard rendering intent
+            stbi__get32le(s); // discard offset of profile data
+            stbi__get32le(s); // discard size of profile data
+            stbi__get32le(s); // discard reserved
+         }
+      }
+   }
+   return (void *) 1;
+}
+
+
+static void *stbi__bmp_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *out;
+   unsigned int mr=0,mg=0,mb=0,ma=0, all_a;
+   stbi_uc pal[256][4];
+   int psize=0,i,j,width;
+   int flip_vertically, pad, target;
+   stbi__bmp_data info;
+   STBI_NOTUSED(ri);
+
+   info.all_a = 255;
+   if (stbi__bmp_parse_header(s, &info) == NULL)
+      return NULL; // error code already set
+
+   flip_vertically = ((int) s->img_y) > 0;
+   s->img_y = abs((int) s->img_y);
+
+   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   mr = info.mr;
+   mg = info.mg;
+   mb = info.mb;
+   ma = info.ma;
+   all_a = info.all_a;
+
+   if (info.hsz == 12) {
+      if (info.bpp < 24)
+         psize = (info.offset - info.extra_read - 24) / 3;
+   } else {
+      if (info.bpp < 16)
+         psize = (info.offset - info.extra_read - info.hsz) >> 2;
+   }
+   if (psize == 0) {
+      if (info.offset != s->callback_already_read + (s->img_buffer - s->img_buffer_original)) {
+        return stbi__errpuc("bad offset", "Corrupt BMP");
+      }
+   }
+
+   if (info.bpp == 24 && ma == 0xff000000)
+      s->img_n = 3;
+   else
+      s->img_n = ma ? 4 : 3;
+   if (req_comp && req_comp >= 3) // we can directly decode 3 or 4
+      target = req_comp;
+   else
+      target = s->img_n; // if they want monochrome, we'll post-convert
+
+   // sanity-check size
+   if (!stbi__mad3sizes_valid(target, s->img_x, s->img_y, 0))
+      return stbi__errpuc("too large", "Corrupt BMP");
+
+   out = (stbi_uc *) stbi__malloc_mad3(target, s->img_x, s->img_y, 0);
+   if (!out) return stbi__errpuc("outofmem", "Out of memory");
+   if (info.bpp < 16) {
+      int z=0;
+      if (psize == 0 || psize > 256) { STBI_FREE(out); return stbi__errpuc("invalid", "Corrupt BMP"); }
+      for (i=0; i < psize; ++i) {
+         pal[i][2] = stbi__get8(s);
+         pal[i][1] = stbi__get8(s);
+         pal[i][0] = stbi__get8(s);
+         if (info.hsz != 12) stbi__get8(s);
+         pal[i][3] = 255;
+      }
+      stbi__skip(s, info.offset - info.extra_read - info.hsz - psize * (info.hsz == 12 ? 3 : 4));
+      if (info.bpp == 1) width = (s->img_x + 7) >> 3;
+      else if (info.bpp == 4) width = (s->img_x + 1) >> 1;
+      else if (info.bpp == 8) width = s->img_x;
+      else { STBI_FREE(out); return stbi__errpuc("bad bpp", "Corrupt BMP"); }
+      pad = (-width)&3;
+      if (info.bpp == 1) {
+         for (j=0; j < (int) s->img_y; ++j) {
+            int bit_offset = 7, v = stbi__get8(s);
+            for (i=0; i < (int) s->img_x; ++i) {
+               int color = (v>>bit_offset)&0x1;
+               out[z++] = pal[color][0];
+               out[z++] = pal[color][1];
+               out[z++] = pal[color][2];
+               if (target == 4) out[z++] = 255;
+               if (i+1 == (int) s->img_x) break;
+               if((--bit_offset) < 0) {
+                  bit_offset = 7;
+                  v = stbi__get8(s);
+               }
+            }
+            stbi__skip(s, pad);
+         }
+      } else {
+         for (j=0; j < (int) s->img_y; ++j) {
+            for (i=0; i < (int) s->img_x; i += 2) {
+               int v=stbi__get8(s),v2=0;
+               if (info.bpp == 4) {
+                  v2 = v & 15;
+                  v >>= 4;
+               }
+               out[z++] = pal[v][0];
+               out[z++] = pal[v][1];
+               out[z++] = pal[v][2];
+               if (target == 4) out[z++] = 255;
+               if (i+1 == (int) s->img_x) break;
+               v = (info.bpp == 8) ? stbi__get8(s) : v2;
+               out[z++] = pal[v][0];
+               out[z++] = pal[v][1];
+               out[z++] = pal[v][2];
+               if (target == 4) out[z++] = 255;
+            }
+            stbi__skip(s, pad);
+         }
+      }
+   } else {
+      int rshift=0,gshift=0,bshift=0,ashift=0,rcount=0,gcount=0,bcount=0,acount=0;
+      int z = 0;
+      int easy=0;
+      stbi__skip(s, info.offset - info.extra_read - info.hsz);
+      if (info.bpp == 24) width = 3 * s->img_x;
+      else if (info.bpp == 16) width = 2*s->img_x;
+      else /* bpp = 32 and pad = 0 */ width=0;
+      pad = (-width) & 3;
+      if (info.bpp == 24) {
+         easy = 1;
+      } else if (info.bpp == 32) {
+         if (mb == 0xff && mg == 0xff00 && mr == 0x00ff0000 && ma == 0xff000000)
+            easy = 2;
+      }
+      if (!easy) {
+         if (!mr || !mg || !mb) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
+         // right shift amt to put high bit in position #7
+         rshift = stbi__high_bit(mr)-7; rcount = stbi__bitcount(mr);
+         gshift = stbi__high_bit(mg)-7; gcount = stbi__bitcount(mg);
+         bshift = stbi__high_bit(mb)-7; bcount = stbi__bitcount(mb);
+         ashift = stbi__high_bit(ma)-7; acount = stbi__bitcount(ma);
+         if (rcount > 8 || gcount > 8 || bcount > 8 || acount > 8) { STBI_FREE(out); return stbi__errpuc("bad masks", "Corrupt BMP"); }
+      }
+      for (j=0; j < (int) s->img_y; ++j) {
+         if (easy) {
+            for (i=0; i < (int) s->img_x; ++i) {
+               unsigned char a;
+               out[z+2] = stbi__get8(s);
+               out[z+1] = stbi__get8(s);
+               out[z+0] = stbi__get8(s);
+               z += 3;
+               a = (easy == 2 ? stbi__get8(s) : 255);
+               all_a |= a;
+               if (target == 4) out[z++] = a;
+            }
+         } else {
+            int bpp = info.bpp;
+            for (i=0; i < (int) s->img_x; ++i) {
+               stbi__uint32 v = (bpp == 16 ? (stbi__uint32) stbi__get16le(s) : stbi__get32le(s));
+               unsigned int a;
+               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mr, rshift, rcount));
+               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mg, gshift, gcount));
+               out[z++] = STBI__BYTECAST(stbi__shiftsigned(v & mb, bshift, bcount));
+               a = (ma ? stbi__shiftsigned(v & ma, ashift, acount) : 255);
+               all_a |= a;
+               if (target == 4) out[z++] = STBI__BYTECAST(a);
+            }
+         }
+         stbi__skip(s, pad);
+      }
+   }
+
+   // if alpha channel is all 0s, replace with all 255s
+   if (target == 4 && all_a == 0)
+      for (i=4*s->img_x*s->img_y-1; i >= 0; i -= 4)
+         out[i] = 255;
+
+   if (flip_vertically) {
+      stbi_uc t;
+      for (j=0; j < (int) s->img_y>>1; ++j) {
+         stbi_uc *p1 = out +      j     *s->img_x*target;
+         stbi_uc *p2 = out + (s->img_y-1-j)*s->img_x*target;
+         for (i=0; i < (int) s->img_x*target; ++i) {
+            t = p1[i]; p1[i] = p2[i]; p2[i] = t;
+         }
+      }
+   }
+
+   if (req_comp && req_comp != target) {
+      out = stbi__convert_format(out, target, req_comp, s->img_x, s->img_y);
+      if (out == NULL) return out; // stbi__convert_format frees input on failure
+   }
+
+   *x = s->img_x;
+   *y = s->img_y;
+   if (comp) *comp = s->img_n;
+   return out;
+}
+#endif
+
+// Targa Truevision - TGA
+// by Jonathan Dummer
+#ifndef STBI_NO_TGA
+// returns STBI_rgb or whatever, 0 on error
+static int stbi__tga_get_comp(int bits_per_pixel, int is_grey, int* is_rgb16)
+{
+   // only RGB or RGBA (incl. 16bit) or grey allowed
+   if (is_rgb16) *is_rgb16 = 0;
+   switch(bits_per_pixel) {
+      case 8:  return STBI_grey;
+      case 16: if(is_grey) return STBI_grey_alpha;
+               // fallthrough
+      case 15: if(is_rgb16) *is_rgb16 = 1;
+               return STBI_rgb;
+      case 24: // fallthrough
+      case 32: return bits_per_pixel/8;
+      default: return 0;
+   }
+}
+
+static int stbi__tga_info(stbi__context *s, int *x, int *y, int *comp)
+{
+    int tga_w, tga_h, tga_comp, tga_image_type, tga_bits_per_pixel, tga_colormap_bpp;
+    int sz, tga_colormap_type;
+    stbi__get8(s);                   // discard Offset
+    tga_colormap_type = stbi__get8(s); // colormap type
+    if( tga_colormap_type > 1 ) {
+        stbi__rewind(s);
+        return 0;      // only RGB or indexed allowed
+    }
+    tga_image_type = stbi__get8(s); // image type
+    if ( tga_colormap_type == 1 ) { // colormapped (paletted) image
+        if (tga_image_type != 1 && tga_image_type != 9) {
+            stbi__rewind(s);
+            return 0;
+        }
+        stbi__skip(s,4);       // skip index of first colormap entry and number of entries
+        sz = stbi__get8(s);    //   check bits per palette color entry
+        if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) {
+            stbi__rewind(s);
+            return 0;
+        }
+        stbi__skip(s,4);       // skip image x and y origin
+        tga_colormap_bpp = sz;
+    } else { // "normal" image w/o colormap - only RGB or grey allowed, +/- RLE
+        if ( (tga_image_type != 2) && (tga_image_type != 3) && (tga_image_type != 10) && (tga_image_type != 11) ) {
+            stbi__rewind(s);
+            return 0; // only RGB or grey allowed, +/- RLE
+        }
+        stbi__skip(s,9); // skip colormap specification and image x/y origin
+        tga_colormap_bpp = 0;
+    }
+    tga_w = stbi__get16le(s);
+    if( tga_w < 1 ) {
+        stbi__rewind(s);
+        return 0;   // test width
+    }
+    tga_h = stbi__get16le(s);
+    if( tga_h < 1 ) {
+        stbi__rewind(s);
+        return 0;   // test height
+    }
+    tga_bits_per_pixel = stbi__get8(s); // bits per pixel
+    stbi__get8(s); // ignore alpha bits
+    if (tga_colormap_bpp != 0) {
+        if((tga_bits_per_pixel != 8) && (tga_bits_per_pixel != 16)) {
+            // when using a colormap, tga_bits_per_pixel is the size of the indexes
+            // I don't think anything but 8 or 16bit indexes makes sense
+            stbi__rewind(s);
+            return 0;
+        }
+        tga_comp = stbi__tga_get_comp(tga_colormap_bpp, 0, NULL);
+    } else {
+        tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3) || (tga_image_type == 11), NULL);
+    }
+    if(!tga_comp) {
+      stbi__rewind(s);
+      return 0;
+    }
+    if (x) *x = tga_w;
+    if (y) *y = tga_h;
+    if (comp) *comp = tga_comp;
+    return 1;                   // seems to have passed everything
+}
+
+static int stbi__tga_test(stbi__context *s)
+{
+   int res = 0;
+   int sz, tga_color_type;
+   stbi__get8(s);      //   discard Offset
+   tga_color_type = stbi__get8(s);   //   color type
+   if ( tga_color_type > 1 ) goto errorEnd;   //   only RGB or indexed allowed
+   sz = stbi__get8(s);   //   image type
+   if ( tga_color_type == 1 ) { // colormapped (paletted) image
+      if (sz != 1 && sz != 9) goto errorEnd; // colortype 1 demands image type 1 or 9
+      stbi__skip(s,4);       // skip index of first colormap entry and number of entries
+      sz = stbi__get8(s);    //   check bits per palette color entry
+      if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
+      stbi__skip(s,4);       // skip image x and y origin
+   } else { // "normal" image w/o colormap
+      if ( (sz != 2) && (sz != 3) && (sz != 10) && (sz != 11) ) goto errorEnd; // only RGB or grey allowed, +/- RLE
+      stbi__skip(s,9); // skip colormap specification and image x/y origin
+   }
+   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test width
+   if ( stbi__get16le(s) < 1 ) goto errorEnd;      //   test height
+   sz = stbi__get8(s);   //   bits per pixel
+   if ( (tga_color_type == 1) && (sz != 8) && (sz != 16) ) goto errorEnd; // for colormapped images, bpp is size of an index
+   if ( (sz != 8) && (sz != 15) && (sz != 16) && (sz != 24) && (sz != 32) ) goto errorEnd;
+
+   res = 1; // if we got this far, everything's good and we can return 1 instead of 0
+
+errorEnd:
+   stbi__rewind(s);
+   return res;
+}
+
+// read 16bit value and convert to 24bit RGB
+static void stbi__tga_read_rgb16(stbi__context *s, stbi_uc* out)
+{
+   stbi__uint16 px = (stbi__uint16)stbi__get16le(s);
+   stbi__uint16 fiveBitMask = 31;
+   // we have 3 channels with 5bits each
+   int r = (px >> 10) & fiveBitMask;
+   int g = (px >> 5) & fiveBitMask;
+   int b = px & fiveBitMask;
+   // Note that this saves the data in RGB(A) order, so it doesn't need to be swapped later
+   out[0] = (stbi_uc)((r * 255)/31);
+   out[1] = (stbi_uc)((g * 255)/31);
+   out[2] = (stbi_uc)((b * 255)/31);
+
+   // some people claim that the most significant bit might be used for alpha
+   // (possibly if an alpha-bit is set in the "image descriptor byte")
+   // but that only made 16bit test images completely translucent..
+   // so let's treat all 15 and 16bit TGAs as RGB with no alpha.
+}
+
+static void *stbi__tga_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   //   read in the TGA header stuff
+   int tga_offset = stbi__get8(s);
+   int tga_indexed = stbi__get8(s);
+   int tga_image_type = stbi__get8(s);
+   int tga_is_RLE = 0;
+   int tga_palette_start = stbi__get16le(s);
+   int tga_palette_len = stbi__get16le(s);
+   int tga_palette_bits = stbi__get8(s);
+   int tga_x_origin = stbi__get16le(s);
+   int tga_y_origin = stbi__get16le(s);
+   int tga_width = stbi__get16le(s);
+   int tga_height = stbi__get16le(s);
+   int tga_bits_per_pixel = stbi__get8(s);
+   int tga_comp, tga_rgb16=0;
+   int tga_inverted = stbi__get8(s);
+   // int tga_alpha_bits = tga_inverted & 15; // the 4 lowest bits - unused (useless?)
+   //   image data
+   unsigned char *tga_data;
+   unsigned char *tga_palette = NULL;
+   int i, j;
+   unsigned char raw_data[4] = {0};
+   int RLE_count = 0;
+   int RLE_repeating = 0;
+   int read_next_pixel = 1;
+   STBI_NOTUSED(ri);
+   STBI_NOTUSED(tga_x_origin); // @TODO
+   STBI_NOTUSED(tga_y_origin); // @TODO
+
+   if (tga_height > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (tga_width > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   //   do a tiny bit of precessing
+   if ( tga_image_type >= 8 )
+   {
+      tga_image_type -= 8;
+      tga_is_RLE = 1;
+   }
+   tga_inverted = 1 - ((tga_inverted >> 5) & 1);
+
+   //   If I'm paletted, then I'll use the number of bits from the palette
+   if ( tga_indexed ) tga_comp = stbi__tga_get_comp(tga_palette_bits, 0, &tga_rgb16);
+   else tga_comp = stbi__tga_get_comp(tga_bits_per_pixel, (tga_image_type == 3), &tga_rgb16);
+
+   if(!tga_comp) // shouldn't really happen, stbi__tga_test() should have ensured basic consistency
+      return stbi__errpuc("bad format", "Can't find out TGA pixelformat");
+
+   //   tga info
+   *x = tga_width;
+   *y = tga_height;
+   if (comp) *comp = tga_comp;
+
+   if (!stbi__mad3sizes_valid(tga_width, tga_height, tga_comp, 0))
+      return stbi__errpuc("too large", "Corrupt TGA");
+
+   tga_data = (unsigned char*)stbi__malloc_mad3(tga_width, tga_height, tga_comp, 0);
+   if (!tga_data) return stbi__errpuc("outofmem", "Out of memory");
+
+   // skip to the data's starting position (offset usually = 0)
+   stbi__skip(s, tga_offset );
+
+   if ( !tga_indexed && !tga_is_RLE && !tga_rgb16 ) {
+      for (i=0; i < tga_height; ++i) {
+         int row = tga_inverted ? tga_height -i - 1 : i;
+         stbi_uc *tga_row = tga_data + row*tga_width*tga_comp;
+         stbi__getn(s, tga_row, tga_width * tga_comp);
+      }
+   } else  {
+      //   do I need to load a palette?
+      if ( tga_indexed)
+      {
+         if (tga_palette_len == 0) {  /* you have to have at least one entry! */
+            STBI_FREE(tga_data);
+            return stbi__errpuc("bad palette", "Corrupt TGA");
+         }
+
+         //   any data to skip? (offset usually = 0)
+         stbi__skip(s, tga_palette_start );
+         //   load the palette
+         tga_palette = (unsigned char*)stbi__malloc_mad2(tga_palette_len, tga_comp, 0);
+         if (!tga_palette) {
+            STBI_FREE(tga_data);
+            return stbi__errpuc("outofmem", "Out of memory");
+         }
+         if (tga_rgb16) {
+            stbi_uc *pal_entry = tga_palette;
+            STBI_ASSERT(tga_comp == STBI_rgb);
+            for (i=0; i < tga_palette_len; ++i) {
+               stbi__tga_read_rgb16(s, pal_entry);
+               pal_entry += tga_comp;
+            }
+         } else if (!stbi__getn(s, tga_palette, tga_palette_len * tga_comp)) {
+               STBI_FREE(tga_data);
+               STBI_FREE(tga_palette);
+               return stbi__errpuc("bad palette", "Corrupt TGA");
+         }
+      }
+      //   load the data
+      for (i=0; i < tga_width * tga_height; ++i)
+      {
+         //   if I'm in RLE mode, do I need to get a RLE stbi__pngchunk?
+         if ( tga_is_RLE )
+         {
+            if ( RLE_count == 0 )
+            {
+               //   yep, get the next byte as a RLE command
+               int RLE_cmd = stbi__get8(s);
+               RLE_count = 1 + (RLE_cmd & 127);
+               RLE_repeating = RLE_cmd >> 7;
+               read_next_pixel = 1;
+            } else if ( !RLE_repeating )
+            {
+               read_next_pixel = 1;
+            }
+         } else
+         {
+            read_next_pixel = 1;
+         }
+         //   OK, if I need to read a pixel, do it now
+         if ( read_next_pixel )
+         {
+            //   load however much data we did have
+            if ( tga_indexed )
+            {
+               // read in index, then perform the lookup
+               int pal_idx = (tga_bits_per_pixel == 8) ? stbi__get8(s) : stbi__get16le(s);
+               if ( pal_idx >= tga_palette_len ) {
+                  // invalid index
+                  pal_idx = 0;
+               }
+               pal_idx *= tga_comp;
+               for (j = 0; j < tga_comp; ++j) {
+                  raw_data[j] = tga_palette[pal_idx+j];
+               }
+            } else if(tga_rgb16) {
+               STBI_ASSERT(tga_comp == STBI_rgb);
+               stbi__tga_read_rgb16(s, raw_data);
+            } else {
+               //   read in the data raw
+               for (j = 0; j < tga_comp; ++j) {
+                  raw_data[j] = stbi__get8(s);
+               }
+            }
+            //   clear the reading flag for the next pixel
+            read_next_pixel = 0;
+         } // end of reading a pixel
+
+         // copy data
+         for (j = 0; j < tga_comp; ++j)
+           tga_data[i*tga_comp+j] = raw_data[j];
+
+         //   in case we're in RLE mode, keep counting down
+         --RLE_count;
+      }
+      //   do I need to invert the image?
+      if ( tga_inverted )
+      {
+         for (j = 0; j*2 < tga_height; ++j)
+         {
+            int index1 = j * tga_width * tga_comp;
+            int index2 = (tga_height - 1 - j) * tga_width * tga_comp;
+            for (i = tga_width * tga_comp; i > 0; --i)
+            {
+               unsigned char temp = tga_data[index1];
+               tga_data[index1] = tga_data[index2];
+               tga_data[index2] = temp;
+               ++index1;
+               ++index2;
+            }
+         }
+      }
+      //   clear my palette, if I had one
+      if ( tga_palette != NULL )
+      {
+         STBI_FREE( tga_palette );
+      }
+   }
+
+   // swap RGB - if the source data was RGB16, it already is in the right order
+   if (tga_comp >= 3 && !tga_rgb16)
+   {
+      unsigned char* tga_pixel = tga_data;
+      for (i=0; i < tga_width * tga_height; ++i)
+      {
+         unsigned char temp = tga_pixel[0];
+         tga_pixel[0] = tga_pixel[2];
+         tga_pixel[2] = temp;
+         tga_pixel += tga_comp;
+      }
+   }
+
+   // convert to target component count
+   if (req_comp && req_comp != tga_comp)
+      tga_data = stbi__convert_format(tga_data, tga_comp, req_comp, tga_width, tga_height);
+
+   //   the things I do to get rid of an error message, and yet keep
+   //   Microsoft's C compilers happy... [8^(
+   tga_palette_start = tga_palette_len = tga_palette_bits =
+         tga_x_origin = tga_y_origin = 0;
+   STBI_NOTUSED(tga_palette_start);
+   //   OK, done
+   return tga_data;
+}
+#endif
+
+// *************************************************************************************************
+// Photoshop PSD loader -- PD by Thatcher Ulrich, integration by Nicolas Schulz, tweaked by STB
+
+#ifndef STBI_NO_PSD
+static int stbi__psd_test(stbi__context *s)
+{
+   int r = (stbi__get32be(s) == 0x38425053);
+   stbi__rewind(s);
+   return r;
+}
+
+static int stbi__psd_decode_rle(stbi__context *s, stbi_uc *p, int pixelCount)
+{
+   int count, nleft, len;
+
+   count = 0;
+   while ((nleft = pixelCount - count) > 0) {
+      len = stbi__get8(s);
+      if (len == 128) {
+         // No-op.
+      } else if (len < 128) {
+         // Copy next len+1 bytes literally.
+         len++;
+         if (len > nleft) return 0; // corrupt data
+         count += len;
+         while (len) {
+            *p = stbi__get8(s);
+            p += 4;
+            len--;
+         }
+      } else if (len > 128) {
+         stbi_uc   val;
+         // Next -len+1 bytes in the dest are replicated from next source byte.
+         // (Interpret len as a negative 8-bit int.)
+         len = 257 - len;
+         if (len > nleft) return 0; // corrupt data
+         val = stbi__get8(s);
+         count += len;
+         while (len) {
+            *p = val;
+            p += 4;
+            len--;
+         }
+      }
+   }
+
+   return 1;
+}
+
+static void *stbi__psd_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri, int bpc)
+{
+   int pixelCount;
+   int channelCount, compression;
+   int channel, i;
+   int bitdepth;
+   int w,h;
+   stbi_uc *out;
+   STBI_NOTUSED(ri);
+
+   // Check identifier
+   if (stbi__get32be(s) != 0x38425053)   // "8BPS"
+      return stbi__errpuc("not PSD", "Corrupt PSD image");
+
+   // Check file type version.
+   if (stbi__get16be(s) != 1)
+      return stbi__errpuc("wrong version", "Unsupported version of PSD image");
+
+   // Skip 6 reserved bytes.
+   stbi__skip(s, 6 );
+
+   // Read the number of channels (R, G, B, A, etc).
+   channelCount = stbi__get16be(s);
+   if (channelCount < 0 || channelCount > 16)
+      return stbi__errpuc("wrong channel count", "Unsupported number of channels in PSD image");
+
+   // Read the rows and columns of the image.
+   h = stbi__get32be(s);
+   w = stbi__get32be(s);
+
+   if (h > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (w > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   // Make sure the depth is 8 bits.
+   bitdepth = stbi__get16be(s);
+   if (bitdepth != 8 && bitdepth != 16)
+      return stbi__errpuc("unsupported bit depth", "PSD bit depth is not 8 or 16 bit");
+
+   // Make sure the color mode is RGB.
+   // Valid options are:
+   //   0: Bitmap
+   //   1: Grayscale
+   //   2: Indexed color
+   //   3: RGB color
+   //   4: CMYK color
+   //   7: Multichannel
+   //   8: Duotone
+   //   9: Lab color
+   if (stbi__get16be(s) != 3)
+      return stbi__errpuc("wrong color format", "PSD is not in RGB color format");
+
+   // Skip the Mode Data.  (It's the palette for indexed color; other info for other modes.)
+   stbi__skip(s,stbi__get32be(s) );
+
+   // Skip the image resources.  (resolution, pen tool paths, etc)
+   stbi__skip(s, stbi__get32be(s) );
+
+   // Skip the reserved data.
+   stbi__skip(s, stbi__get32be(s) );
+
+   // Find out if the data is compressed.
+   // Known values:
+   //   0: no compression
+   //   1: RLE compressed
+   compression = stbi__get16be(s);
+   if (compression > 1)
+      return stbi__errpuc("bad compression", "PSD has an unknown compression format");
+
+   // Check size
+   if (!stbi__mad3sizes_valid(4, w, h, 0))
+      return stbi__errpuc("too large", "Corrupt PSD");
+
+   // Create the destination image.
+
+   if (!compression && bitdepth == 16 && bpc == 16) {
+      out = (stbi_uc *) stbi__malloc_mad3(8, w, h, 0);
+      ri->bits_per_channel = 16;
+   } else
+      out = (stbi_uc *) stbi__malloc(4 * w*h);
+
+   if (!out) return stbi__errpuc("outofmem", "Out of memory");
+   pixelCount = w*h;
+
+   // Initialize the data to zero.
+   //memset( out, 0, pixelCount * 4 );
+
+   // Finally, the image data.
+   if (compression) {
+      // RLE as used by .PSD and .TIFF
+      // Loop until you get the number of unpacked bytes you are expecting:
+      //     Read the next source byte into n.
+      //     If n is between 0 and 127 inclusive, copy the next n+1 bytes literally.
+      //     Else if n is between -127 and -1 inclusive, copy the next byte -n+1 times.
+      //     Else if n is 128, noop.
+      // Endloop
+
+      // The RLE-compressed data is preceded by a 2-byte data count for each row in the data,
+      // which we're going to just skip.
+      stbi__skip(s, h * channelCount * 2 );
+
+      // Read the RLE data by channel.
+      for (channel = 0; channel < 4; channel++) {
+         stbi_uc *p;
+
+         p = out+channel;
+         if (channel >= channelCount) {
+            // Fill this channel with default data.
+            for (i = 0; i < pixelCount; i++, p += 4)
+               *p = (channel == 3 ? 255 : 0);
+         } else {
+            // Read the RLE data.
+            if (!stbi__psd_decode_rle(s, p, pixelCount)) {
+               STBI_FREE(out);
+               return stbi__errpuc("corrupt", "bad RLE data");
+            }
+         }
+      }
+
+   } else {
+      // We're at the raw image data.  It's each channel in order (Red, Green, Blue, Alpha, ...)
+      // where each channel consists of an 8-bit (or 16-bit) value for each pixel in the image.
+
+      // Read the data by channel.
+      for (channel = 0; channel < 4; channel++) {
+         if (channel >= channelCount) {
+            // Fill this channel with default data.
+            if (bitdepth == 16 && bpc == 16) {
+               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
+               stbi__uint16 val = channel == 3 ? 65535 : 0;
+               for (i = 0; i < pixelCount; i++, q += 4)
+                  *q = val;
+            } else {
+               stbi_uc *p = out+channel;
+               stbi_uc val = channel == 3 ? 255 : 0;
+               for (i = 0; i < pixelCount; i++, p += 4)
+                  *p = val;
+            }
+         } else {
+            if (ri->bits_per_channel == 16) {    // output bpc
+               stbi__uint16 *q = ((stbi__uint16 *) out) + channel;
+               for (i = 0; i < pixelCount; i++, q += 4)
+                  *q = (stbi__uint16) stbi__get16be(s);
+            } else {
+               stbi_uc *p = out+channel;
+               if (bitdepth == 16) {  // input bpc
+                  for (i = 0; i < pixelCount; i++, p += 4)
+                     *p = (stbi_uc) (stbi__get16be(s) >> 8);
+               } else {
+                  for (i = 0; i < pixelCount; i++, p += 4)
+                     *p = stbi__get8(s);
+               }
+            }
+         }
+      }
+   }
+
+   // remove weird white matte from PSD
+   if (channelCount >= 4) {
+      if (ri->bits_per_channel == 16) {
+         for (i=0; i < w*h; ++i) {
+            stbi__uint16 *pixel = (stbi__uint16 *) out + 4*i;
+            if (pixel[3] != 0 && pixel[3] != 65535) {
+               float a = pixel[3] / 65535.0f;
+               float ra = 1.0f / a;
+               float inv_a = 65535.0f * (1 - ra);
+               pixel[0] = (stbi__uint16) (pixel[0]*ra + inv_a);
+               pixel[1] = (stbi__uint16) (pixel[1]*ra + inv_a);
+               pixel[2] = (stbi__uint16) (pixel[2]*ra + inv_a);
+            }
+         }
+      } else {
+         for (i=0; i < w*h; ++i) {
+            unsigned char *pixel = out + 4*i;
+            if (pixel[3] != 0 && pixel[3] != 255) {
+               float a = pixel[3] / 255.0f;
+               float ra = 1.0f / a;
+               float inv_a = 255.0f * (1 - ra);
+               pixel[0] = (unsigned char) (pixel[0]*ra + inv_a);
+               pixel[1] = (unsigned char) (pixel[1]*ra + inv_a);
+               pixel[2] = (unsigned char) (pixel[2]*ra + inv_a);
+            }
+         }
+      }
+   }
+
+   // convert to desired output format
+   if (req_comp && req_comp != 4) {
+      if (ri->bits_per_channel == 16)
+         out = (stbi_uc *) stbi__convert_format16((stbi__uint16 *) out, 4, req_comp, w, h);
+      else
+         out = stbi__convert_format(out, 4, req_comp, w, h);
+      if (out == NULL) return out; // stbi__convert_format frees input on failure
+   }
+
+   if (comp) *comp = 4;
+   *y = h;
+   *x = w;
+
+   return out;
+}
+#endif
+
+// *************************************************************************************************
+// Softimage PIC loader
+// by Tom Seddon
+//
+// See http://softimage.wiki.softimage.com/index.php/INFO:_PIC_file_format
+// See http://ozviz.wasp.uwa.edu.au/~pbourke/dataformats/softimagepic/
+
+#ifndef STBI_NO_PIC
+static int stbi__pic_is4(stbi__context *s,const char *str)
+{
+   int i;
+   for (i=0; i<4; ++i)
+      if (stbi__get8(s) != (stbi_uc)str[i])
+         return 0;
+
+   return 1;
+}
+
+static int stbi__pic_test_core(stbi__context *s)
+{
+   int i;
+
+   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34"))
+      return 0;
+
+   for(i=0;i<84;++i)
+      stbi__get8(s);
+
+   if (!stbi__pic_is4(s,"PICT"))
+      return 0;
+
+   return 1;
+}
+
+typedef struct
+{
+   stbi_uc size,type,channel;
+} stbi__pic_packet;
+
+static stbi_uc *stbi__readval(stbi__context *s, int channel, stbi_uc *dest)
+{
+   int mask=0x80, i;
+
+   for (i=0; i<4; ++i, mask>>=1) {
+      if (channel & mask) {
+         if (stbi__at_eof(s)) return stbi__errpuc("bad file","PIC file too short");
+         dest[i]=stbi__get8(s);
+      }
+   }
+
+   return dest;
+}
+
+static void stbi__copyval(int channel,stbi_uc *dest,const stbi_uc *src)
+{
+   int mask=0x80,i;
+
+   for (i=0;i<4; ++i, mask>>=1)
+      if (channel&mask)
+         dest[i]=src[i];
+}
+
+static stbi_uc *stbi__pic_load_core(stbi__context *s,int width,int height,int *comp, stbi_uc *result)
+{
+   int act_comp=0,num_packets=0,y,chained;
+   stbi__pic_packet packets[10];
+
+   // this will (should...) cater for even some bizarre stuff like having data
+    // for the same channel in multiple packets.
+   do {
+      stbi__pic_packet *packet;
+
+      if (num_packets==sizeof(packets)/sizeof(packets[0]))
+         return stbi__errpuc("bad format","too many packets");
+
+      packet = &packets[num_packets++];
+
+      chained = stbi__get8(s);
+      packet->size    = stbi__get8(s);
+      packet->type    = stbi__get8(s);
+      packet->channel = stbi__get8(s);
+
+      act_comp |= packet->channel;
+
+      if (stbi__at_eof(s))          return stbi__errpuc("bad file","file too short (reading packets)");
+      if (packet->size != 8)  return stbi__errpuc("bad format","packet isn't 8bpp");
+   } while (chained);
+
+   *comp = (act_comp & 0x10 ? 4 : 3); // has alpha channel?
+
+   for(y=0; y<height; ++y) {
+      int packet_idx;
+
+      for(packet_idx=0; packet_idx < num_packets; ++packet_idx) {
+         stbi__pic_packet *packet = &packets[packet_idx];
+         stbi_uc *dest = result+y*width*4;
+
+         switch (packet->type) {
+            default:
+               return stbi__errpuc("bad format","packet has bad compression type");
+
+            case 0: {//uncompressed
+               int x;
+
+               for(x=0;x<width;++x, dest+=4)
+                  if (!stbi__readval(s,packet->channel,dest))
+                     return 0;
+               break;
+            }
+
+            case 1://Pure RLE
+               {
+                  int left=width, i;
+
+                  while (left>0) {
+                     stbi_uc count,value[4];
+
+                     count=stbi__get8(s);
+                     if (stbi__at_eof(s))   return stbi__errpuc("bad file","file too short (pure read count)");
+
+                     if (count > left)
+                        count = (stbi_uc) left;
+
+                     if (!stbi__readval(s,packet->channel,value))  return 0;
+
+                     for(i=0; i<count; ++i,dest+=4)
+                        stbi__copyval(packet->channel,dest,value);
+                     left -= count;
+                  }
+               }
+               break;
+
+            case 2: {//Mixed RLE
+               int left=width;
+               while (left>0) {
+                  int count = stbi__get8(s), i;
+                  if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (mixed read count)");
+
+                  if (count >= 128) { // Repeated
+                     stbi_uc value[4];
+
+                     if (count==128)
+                        count = stbi__get16be(s);
+                     else
+                        count -= 127;
+                     if (count > left)
+                        return stbi__errpuc("bad file","scanline overrun");
+
+                     if (!stbi__readval(s,packet->channel,value))
+                        return 0;
+
+                     for(i=0;i<count;++i, dest += 4)
+                        stbi__copyval(packet->channel,dest,value);
+                  } else { // Raw
+                     ++count;
+                     if (count>left) return stbi__errpuc("bad file","scanline overrun");
+
+                     for(i=0;i<count;++i, dest+=4)
+                        if (!stbi__readval(s,packet->channel,dest))
+                           return 0;
+                  }
+                  left-=count;
+               }
+               break;
+            }
+         }
+      }
+   }
+
+   return result;
+}
+
+static void *stbi__pic_load(stbi__context *s,int *px,int *py,int *comp,int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *result;
+   int i, x,y, internal_comp;
+   STBI_NOTUSED(ri);
+
+   if (!comp) comp = &internal_comp;
+
+   for (i=0; i<92; ++i)
+      stbi__get8(s);
+
+   x = stbi__get16be(s);
+   y = stbi__get16be(s);
+
+   if (y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   if (stbi__at_eof(s))  return stbi__errpuc("bad file","file too short (pic header)");
+   if (!stbi__mad3sizes_valid(x, y, 4, 0)) return stbi__errpuc("too large", "PIC image too large to decode");
+
+   stbi__get32be(s); //skip `ratio'
+   stbi__get16be(s); //skip `fields'
+   stbi__get16be(s); //skip `pad'
+
+   // intermediate buffer is RGBA
+   result = (stbi_uc *) stbi__malloc_mad3(x, y, 4, 0);
+   if (!result) return stbi__errpuc("outofmem", "Out of memory");
+   memset(result, 0xff, x*y*4);
+
+   if (!stbi__pic_load_core(s,x,y,comp, result)) {
+      STBI_FREE(result);
+      result=0;
+   }
+   *px = x;
+   *py = y;
+   if (req_comp == 0) req_comp = *comp;
+   result=stbi__convert_format(result,4,req_comp,x,y);
+
+   return result;
+}
+
+static int stbi__pic_test(stbi__context *s)
+{
+   int r = stbi__pic_test_core(s);
+   stbi__rewind(s);
+   return r;
+}
+#endif
+
+// *************************************************************************************************
+// GIF loader -- public domain by Jean-Marc Lienher -- simplified/shrunk by stb
+
+#ifndef STBI_NO_GIF
+typedef struct
+{
+   stbi__int16 prefix;
+   stbi_uc first;
+   stbi_uc suffix;
+} stbi__gif_lzw;
+
+typedef struct
+{
+   int w,h;
+   stbi_uc *out;                 // output buffer (always 4 components)
+   stbi_uc *background;          // The current "background" as far as a gif is concerned
+   stbi_uc *history;
+   int flags, bgindex, ratio, transparent, eflags;
+   stbi_uc  pal[256][4];
+   stbi_uc lpal[256][4];
+   stbi__gif_lzw codes[8192];
+   stbi_uc *color_table;
+   int parse, step;
+   int lflags;
+   int start_x, start_y;
+   int max_x, max_y;
+   int cur_x, cur_y;
+   int line_size;
+   int delay;
+} stbi__gif;
+
+static int stbi__gif_test_raw(stbi__context *s)
+{
+   int sz;
+   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8') return 0;
+   sz = stbi__get8(s);
+   if (sz != '9' && sz != '7') return 0;
+   if (stbi__get8(s) != 'a') return 0;
+   return 1;
+}
+
+static int stbi__gif_test(stbi__context *s)
+{
+   int r = stbi__gif_test_raw(s);
+   stbi__rewind(s);
+   return r;
+}
+
+static void stbi__gif_parse_colortable(stbi__context *s, stbi_uc pal[256][4], int num_entries, int transp)
+{
+   int i;
+   for (i=0; i < num_entries; ++i) {
+      pal[i][2] = stbi__get8(s);
+      pal[i][1] = stbi__get8(s);
+      pal[i][0] = stbi__get8(s);
+      pal[i][3] = transp == i ? 0 : 255;
+   }
+}
+
+static int stbi__gif_header(stbi__context *s, stbi__gif *g, int *comp, int is_info)
+{
+   stbi_uc version;
+   if (stbi__get8(s) != 'G' || stbi__get8(s) != 'I' || stbi__get8(s) != 'F' || stbi__get8(s) != '8')
+      return stbi__err("not GIF", "Corrupt GIF");
+
+   version = stbi__get8(s);
+   if (version != '7' && version != '9')    return stbi__err("not GIF", "Corrupt GIF");
+   if (stbi__get8(s) != 'a')                return stbi__err("not GIF", "Corrupt GIF");
+
+   stbi__g_failure_reason = "";
+   g->w = stbi__get16le(s);
+   g->h = stbi__get16le(s);
+   g->flags = stbi__get8(s);
+   g->bgindex = stbi__get8(s);
+   g->ratio = stbi__get8(s);
+   g->transparent = -1;
+
+   if (g->w > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+   if (g->h > STBI_MAX_DIMENSIONS) return stbi__err("too large","Very large image (corrupt?)");
+
+   if (comp != 0) *comp = 4;  // can't actually tell whether it's 3 or 4 until we parse the comments
+
+   if (is_info) return 1;
+
+   if (g->flags & 0x80)
+      stbi__gif_parse_colortable(s,g->pal, 2 << (g->flags & 7), -1);
+
+   return 1;
+}
+
+static int stbi__gif_info_raw(stbi__context *s, int *x, int *y, int *comp)
+{
+   stbi__gif* g = (stbi__gif*) stbi__malloc(sizeof(stbi__gif));
+   if (!g) return stbi__err("outofmem", "Out of memory");
+   if (!stbi__gif_header(s, g, comp, 1)) {
+      STBI_FREE(g);
+      stbi__rewind( s );
+      return 0;
+   }
+   if (x) *x = g->w;
+   if (y) *y = g->h;
+   STBI_FREE(g);
+   return 1;
+}
+
+static void stbi__out_gif_code(stbi__gif *g, stbi__uint16 code)
+{
+   stbi_uc *p, *c;
+   int idx;
+
+   // recurse to decode the prefixes, since the linked-list is backwards,
+   // and working backwards through an interleaved image would be nasty
+   if (g->codes[code].prefix >= 0)
+      stbi__out_gif_code(g, g->codes[code].prefix);
+
+   if (g->cur_y >= g->max_y) return;
+
+   idx = g->cur_x + g->cur_y;
+   p = &g->out[idx];
+   g->history[idx / 4] = 1;
+
+   c = &g->color_table[g->codes[code].suffix * 4];
+   if (c[3] > 128) { // don't render transparent pixels;
+      p[0] = c[2];
+      p[1] = c[1];
+      p[2] = c[0];
+      p[3] = c[3];
+   }
+   g->cur_x += 4;
+
+   if (g->cur_x >= g->max_x) {
+      g->cur_x = g->start_x;
+      g->cur_y += g->step;
+
+      while (g->cur_y >= g->max_y && g->parse > 0) {
+         g->step = (1 << g->parse) * g->line_size;
+         g->cur_y = g->start_y + (g->step >> 1);
+         --g->parse;
+      }
+   }
+}
+
+static stbi_uc *stbi__process_gif_raster(stbi__context *s, stbi__gif *g)
+{
+   stbi_uc lzw_cs;
+   stbi__int32 len, init_code;
+   stbi__uint32 first;
+   stbi__int32 codesize, codemask, avail, oldcode, bits, valid_bits, clear;
+   stbi__gif_lzw *p;
+
+   lzw_cs = stbi__get8(s);
+   if (lzw_cs > 12) return NULL;
+   clear = 1 << lzw_cs;
+   first = 1;
+   codesize = lzw_cs + 1;
+   codemask = (1 << codesize) - 1;
+   bits = 0;
+   valid_bits = 0;
+   for (init_code = 0; init_code < clear; init_code++) {
+      g->codes[init_code].prefix = -1;
+      g->codes[init_code].first = (stbi_uc) init_code;
+      g->codes[init_code].suffix = (stbi_uc) init_code;
+   }
+
+   // support no starting clear code
+   avail = clear+2;
+   oldcode = -1;
+
+   len = 0;
+   for(;;) {
+      if (valid_bits < codesize) {
+         if (len == 0) {
+            len = stbi__get8(s); // start new block
+            if (len == 0)
+               return g->out;
+         }
+         --len;
+         bits |= (stbi__int32) stbi__get8(s) << valid_bits;
+         valid_bits += 8;
+      } else {
+         stbi__int32 code = bits & codemask;
+         bits >>= codesize;
+         valid_bits -= codesize;
+         // @OPTIMIZE: is there some way we can accelerate the non-clear path?
+         if (code == clear) {  // clear code
+            codesize = lzw_cs + 1;
+            codemask = (1 << codesize) - 1;
+            avail = clear + 2;
+            oldcode = -1;
+            first = 0;
+         } else if (code == clear + 1) { // end of stream code
+            stbi__skip(s, len);
+            while ((len = stbi__get8(s)) > 0)
+               stbi__skip(s,len);
+            return g->out;
+         } else if (code <= avail) {
+            if (first) {
+               return stbi__errpuc("no clear code", "Corrupt GIF");
+            }
+
+            if (oldcode >= 0) {
+               p = &g->codes[avail++];
+               if (avail > 8192) {
+                  return stbi__errpuc("too many codes", "Corrupt GIF");
+               }
+
+               p->prefix = (stbi__int16) oldcode;
+               p->first = g->codes[oldcode].first;
+               p->suffix = (code == avail) ? p->first : g->codes[code].first;
+            } else if (code == avail)
+               return stbi__errpuc("illegal code in raster", "Corrupt GIF");
+
+            stbi__out_gif_code(g, (stbi__uint16) code);
+
+            if ((avail & codemask) == 0 && avail <= 0x0FFF) {
+               codesize++;
+               codemask = (1 << codesize) - 1;
+            }
+
+            oldcode = code;
+         } else {
+            return stbi__errpuc("illegal code in raster", "Corrupt GIF");
+         }
+      }
+   }
+}
+
+// this function is designed to support animated gifs, although stb_image doesn't support it
+// two back is the image from two frames ago, used for a very specific disposal format
+static stbi_uc *stbi__gif_load_next(stbi__context *s, stbi__gif *g, int *comp, int req_comp, stbi_uc *two_back)
+{
+   int dispose;
+   int first_frame;
+   int pi;
+   int pcount;
+   STBI_NOTUSED(req_comp);
+
+   // on first frame, any non-written pixels get the background colour (non-transparent)
+   first_frame = 0;
+   if (g->out == 0) {
+      if (!stbi__gif_header(s, g, comp,0)) return 0; // stbi__g_failure_reason set by stbi__gif_header
+      if (!stbi__mad3sizes_valid(4, g->w, g->h, 0))
+         return stbi__errpuc("too large", "GIF image is too large");
+      pcount = g->w * g->h;
+      g->out = (stbi_uc *) stbi__malloc(4 * pcount);
+      g->background = (stbi_uc *) stbi__malloc(4 * pcount);
+      g->history = (stbi_uc *) stbi__malloc(pcount);
+      if (!g->out || !g->background || !g->history)
+         return stbi__errpuc("outofmem", "Out of memory");
+
+      // image is treated as "transparent" at the start - ie, nothing overwrites the current background;
+      // background colour is only used for pixels that are not rendered first frame, after that "background"
+      // color refers to the color that was there the previous frame.
+      memset(g->out, 0x00, 4 * pcount);
+      memset(g->background, 0x00, 4 * pcount); // state of the background (starts transparent)
+      memset(g->history, 0x00, pcount);        // pixels that were affected previous frame
+      first_frame = 1;
+   } else {
+      // second frame - how do we dispose of the previous one?
+      dispose = (g->eflags & 0x1C) >> 2;
+      pcount = g->w * g->h;
+
+      if ((dispose == 3) && (two_back == 0)) {
+         dispose = 2; // if I don't have an image to revert back to, default to the old background
+      }
+
+      if (dispose == 3) { // use previous graphic
+         for (pi = 0; pi < pcount; ++pi) {
+            if (g->history[pi]) {
+               memcpy( &g->out[pi * 4], &two_back[pi * 4], 4 );
+            }
+         }
+      } else if (dispose == 2) {
+         // restore what was changed last frame to background before that frame;
+         for (pi = 0; pi < pcount; ++pi) {
+            if (g->history[pi]) {
+               memcpy( &g->out[pi * 4], &g->background[pi * 4], 4 );
+            }
+         }
+      } else {
+         // This is a non-disposal case eithe way, so just
+         // leave the pixels as is, and they will become the new background
+         // 1: do not dispose
+         // 0:  not specified.
+      }
+
+      // background is what out is after the undoing of the previou frame;
+      memcpy( g->background, g->out, 4 * g->w * g->h );
+   }
+
+   // clear my history;
+   memset( g->history, 0x00, g->w * g->h );        // pixels that were affected previous frame
+
+   for (;;) {
+      int tag = stbi__get8(s);
+      switch (tag) {
+         case 0x2C: /* Image Descriptor */
+         {
+            stbi__int32 x, y, w, h;
+            stbi_uc *o;
+
+            x = stbi__get16le(s);
+            y = stbi__get16le(s);
+            w = stbi__get16le(s);
+            h = stbi__get16le(s);
+            if (((x + w) > (g->w)) || ((y + h) > (g->h)))
+               return stbi__errpuc("bad Image Descriptor", "Corrupt GIF");
+
+            g->line_size = g->w * 4;
+            g->start_x = x * 4;
+            g->start_y = y * g->line_size;
+            g->max_x   = g->start_x + w * 4;
+            g->max_y   = g->start_y + h * g->line_size;
+            g->cur_x   = g->start_x;
+            g->cur_y   = g->start_y;
+
+            // if the width of the specified rectangle is 0, that means
+            // we may not see *any* pixels or the image is malformed;
+            // to make sure this is caught, move the current y down to
+            // max_y (which is what out_gif_code checks).
+            if (w == 0)
+               g->cur_y = g->max_y;
+
+            g->lflags = stbi__get8(s);
+
+            if (g->lflags & 0x40) {
+               g->step = 8 * g->line_size; // first interlaced spacing
+               g->parse = 3;
+            } else {
+               g->step = g->line_size;
+               g->parse = 0;
+            }
+
+            if (g->lflags & 0x80) {
+               stbi__gif_parse_colortable(s,g->lpal, 2 << (g->lflags & 7), g->eflags & 0x01 ? g->transparent : -1);
+               g->color_table = (stbi_uc *) g->lpal;
+            } else if (g->flags & 0x80) {
+               g->color_table = (stbi_uc *) g->pal;
+            } else
+               return stbi__errpuc("missing color table", "Corrupt GIF");
+
+            o = stbi__process_gif_raster(s, g);
+            if (!o) return NULL;
+
+            // if this was the first frame,
+            pcount = g->w * g->h;
+            if (first_frame && (g->bgindex > 0)) {
+               // if first frame, any pixel not drawn to gets the background color
+               for (pi = 0; pi < pcount; ++pi) {
+                  if (g->history[pi] == 0) {
+                     g->pal[g->bgindex][3] = 255; // just in case it was made transparent, undo that; It will be reset next frame if need be;
+                     memcpy( &g->out[pi * 4], &g->pal[g->bgindex], 4 );
+                  }
+               }
+            }
+
+            return o;
+         }
+
+         case 0x21: // Comment Extension.
+         {
+            int len;
+            int ext = stbi__get8(s);
+            if (ext == 0xF9) { // Graphic Control Extension.
+               len = stbi__get8(s);
+               if (len == 4) {
+                  g->eflags = stbi__get8(s);
+                  g->delay = 10 * stbi__get16le(s); // delay - 1/100th of a second, saving as 1/1000ths.
+
+                  // unset old transparent
+                  if (g->transparent >= 0) {
+                     g->pal[g->transparent][3] = 255;
+                  }
+                  if (g->eflags & 0x01) {
+                     g->transparent = stbi__get8(s);
+                     if (g->transparent >= 0) {
+                        g->pal[g->transparent][3] = 0;
+                     }
+                  } else {
+                     // don't need transparent
+                     stbi__skip(s, 1);
+                     g->transparent = -1;
+                  }
+               } else {
+                  stbi__skip(s, len);
+                  break;
+               }
+            }
+            while ((len = stbi__get8(s)) != 0) {
+               stbi__skip(s, len);
+            }
+            break;
+         }
+
+         case 0x3B: // gif stream termination code
+            return (stbi_uc *) s; // using '1' causes warning on some compilers
+
+         default:
+            return stbi__errpuc("unknown code", "Corrupt GIF");
+      }
+   }
+}
+
+static void *stbi__load_gif_main_outofmem(stbi__gif *g, stbi_uc *out, int **delays)
+{
+   STBI_FREE(g->out);
+   STBI_FREE(g->history);
+   STBI_FREE(g->background);
+
+   if (out) STBI_FREE(out);
+   if (delays && *delays) STBI_FREE(*delays);
+   return stbi__errpuc("outofmem", "Out of memory");
+}
+
+static void *stbi__load_gif_main(stbi__context *s, int **delays, int *x, int *y, int *z, int *comp, int req_comp)
+{
+   if (stbi__gif_test(s)) {
+      int layers = 0;
+      stbi_uc *u = 0;
+      stbi_uc *out = 0;
+      stbi_uc *two_back = 0;
+      stbi__gif g;
+      int stride;
+      int out_size = 0;
+      int delays_size = 0;
+
+      STBI_NOTUSED(out_size);
+      STBI_NOTUSED(delays_size);
+
+      memset(&g, 0, sizeof(g));
+      if (delays) {
+         *delays = 0;
+      }
+
+      do {
+         u = stbi__gif_load_next(s, &g, comp, req_comp, two_back);
+         if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
+
+         if (u) {
+            *x = g.w;
+            *y = g.h;
+            ++layers;
+            stride = g.w * g.h * 4;
+
+            if (out) {
+               void *tmp = (stbi_uc*) STBI_REALLOC_SIZED( out, out_size, layers * stride );
+               if (!tmp)
+                  return stbi__load_gif_main_outofmem(&g, out, delays);
+               else {
+                   out = (stbi_uc*) tmp;
+                   out_size = layers * stride;
+               }
+
+               if (delays) {
+                  int *new_delays = (int*) STBI_REALLOC_SIZED( *delays, delays_size, sizeof(int) * layers );
+                  if (!new_delays)
+                     return stbi__load_gif_main_outofmem(&g, out, delays);
+                  *delays = new_delays;
+                  delays_size = layers * sizeof(int);
+               }
+            } else {
+               out = (stbi_uc*)stbi__malloc( layers * stride );
+               if (!out)
+                  return stbi__load_gif_main_outofmem(&g, out, delays);
+               out_size = layers * stride;
+               if (delays) {
+                  *delays = (int*) stbi__malloc( layers * sizeof(int) );
+                  if (!*delays)
+                     return stbi__load_gif_main_outofmem(&g, out, delays);
+                  delays_size = layers * sizeof(int);
+               }
+            }
+            memcpy( out + ((layers - 1) * stride), u, stride );
+            if (layers >= 2) {
+               two_back = out - 2 * stride;
+            }
+
+            if (delays) {
+               (*delays)[layers - 1U] = g.delay;
+            }
+         }
+      } while (u != 0);
+
+      // free temp buffer;
+      STBI_FREE(g.out);
+      STBI_FREE(g.history);
+      STBI_FREE(g.background);
+
+      // do the final conversion after loading everything;
+      if (req_comp && req_comp != 4)
+         out = stbi__convert_format(out, 4, req_comp, layers * g.w, g.h);
+
+      *z = layers;
+      return out;
+   } else {
+      return stbi__errpuc("not GIF", "Image was not as a gif type.");
+   }
+}
+
+static void *stbi__gif_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *u = 0;
+   stbi__gif g;
+   memset(&g, 0, sizeof(g));
+   STBI_NOTUSED(ri);
+
+   u = stbi__gif_load_next(s, &g, comp, req_comp, 0);
+   if (u == (stbi_uc *) s) u = 0;  // end of animated gif marker
+   if (u) {
+      *x = g.w;
+      *y = g.h;
+
+      // moved conversion to after successful load so that the same
+      // can be done for multiple frames.
+      if (req_comp && req_comp != 4)
+         u = stbi__convert_format(u, 4, req_comp, g.w, g.h);
+   } else if (g.out) {
+      // if there was an error and we allocated an image buffer, free it!
+      STBI_FREE(g.out);
+   }
+
+   // free buffers needed for multiple frame loading;
+   STBI_FREE(g.history);
+   STBI_FREE(g.background);
+
+   return u;
+}
+
+static int stbi__gif_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   return stbi__gif_info_raw(s,x,y,comp);
+}
+#endif
+
+// *************************************************************************************************
+// Radiance RGBE HDR loader
+// originally by Nicolas Schulz
+#ifndef STBI_NO_HDR
+static int stbi__hdr_test_core(stbi__context *s, const char *signature)
+{
+   int i;
+   for (i=0; signature[i]; ++i)
+      if (stbi__get8(s) != signature[i])
+          return 0;
+   stbi__rewind(s);
+   return 1;
+}
+
+static int stbi__hdr_test(stbi__context* s)
+{
+   int r = stbi__hdr_test_core(s, "#?RADIANCE\n");
+   stbi__rewind(s);
+   if(!r) {
+       r = stbi__hdr_test_core(s, "#?RGBE\n");
+       stbi__rewind(s);
+   }
+   return r;
+}
+
+#define STBI__HDR_BUFLEN  1024
+static char *stbi__hdr_gettoken(stbi__context *z, char *buffer)
+{
+   int len=0;
+   char c = '\0';
+
+   c = (char) stbi__get8(z);
+
+   while (!stbi__at_eof(z) && c != '\n') {
+      buffer[len++] = c;
+      if (len == STBI__HDR_BUFLEN-1) {
+         // flush to end of line
+         while (!stbi__at_eof(z) && stbi__get8(z) != '\n')
+            ;
+         break;
+      }
+      c = (char) stbi__get8(z);
+   }
+
+   buffer[len] = 0;
+   return buffer;
+}
+
+static void stbi__hdr_convert(float *output, stbi_uc *input, int req_comp)
+{
+   if ( input[3] != 0 ) {
+      float f1;
+      // Exponent
+      f1 = (float) ldexp(1.0f, input[3] - (int)(128 + 8));
+      if (req_comp <= 2)
+         output[0] = (input[0] + input[1] + input[2]) * f1 / 3;
+      else {
+         output[0] = input[0] * f1;
+         output[1] = input[1] * f1;
+         output[2] = input[2] * f1;
+      }
+      if (req_comp == 2) output[1] = 1;
+      if (req_comp == 4) output[3] = 1;
+   } else {
+      switch (req_comp) {
+         case 4: output[3] = 1; /* fallthrough */
+         case 3: output[0] = output[1] = output[2] = 0;
+                 break;
+         case 2: output[1] = 1; /* fallthrough */
+         case 1: output[0] = 0;
+                 break;
+      }
+   }
+}
+
+static float *stbi__hdr_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   char buffer[STBI__HDR_BUFLEN];
+   char *token;
+   int valid = 0;
+   int width, height;
+   stbi_uc *scanline;
+   float *hdr_data;
+   int len;
+   unsigned char count, value;
+   int i, j, k, c1,c2, z;
+   const char *headerToken;
+   STBI_NOTUSED(ri);
+
+   // Check identifier
+   headerToken = stbi__hdr_gettoken(s,buffer);
+   if (strcmp(headerToken, "#?RADIANCE") != 0 && strcmp(headerToken, "#?RGBE") != 0)
+      return stbi__errpf("not HDR", "Corrupt HDR image");
+
+   // Parse header
+   for(;;) {
+      token = stbi__hdr_gettoken(s,buffer);
+      if (token[0] == 0) break;
+      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
+   }
+
+   if (!valid)    return stbi__errpf("unsupported format", "Unsupported HDR format");
+
+   // Parse width and height
+   // can't use sscanf() if we're not using stdio!
+   token = stbi__hdr_gettoken(s,buffer);
+   if (strncmp(token, "-Y ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
+   token += 3;
+   height = (int) strtol(token, &token, 10);
+   while (*token == ' ') ++token;
+   if (strncmp(token, "+X ", 3))  return stbi__errpf("unsupported data layout", "Unsupported HDR format");
+   token += 3;
+   width = (int) strtol(token, NULL, 10);
+
+   if (height > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
+   if (width > STBI_MAX_DIMENSIONS) return stbi__errpf("too large","Very large image (corrupt?)");
+
+   *x = width;
+   *y = height;
+
+   if (comp) *comp = 3;
+   if (req_comp == 0) req_comp = 3;
+
+   if (!stbi__mad4sizes_valid(width, height, req_comp, sizeof(float), 0))
+      return stbi__errpf("too large", "HDR image is too large");
+
+   // Read data
+   hdr_data = (float *) stbi__malloc_mad4(width, height, req_comp, sizeof(float), 0);
+   if (!hdr_data)
+      return stbi__errpf("outofmem", "Out of memory");
+
+   // Load image data
+   // image data is stored as some number of sca
+   if ( width < 8 || width >= 32768) {
+      // Read flat data
+      for (j=0; j < height; ++j) {
+         for (i=0; i < width; ++i) {
+            stbi_uc rgbe[4];
+           main_decode_loop:
+            stbi__getn(s, rgbe, 4);
+            stbi__hdr_convert(hdr_data + j * width * req_comp + i * req_comp, rgbe, req_comp);
+         }
+      }
+   } else {
+      // Read RLE-encoded data
+      scanline = NULL;
+
+      for (j = 0; j < height; ++j) {
+         c1 = stbi__get8(s);
+         c2 = stbi__get8(s);
+         len = stbi__get8(s);
+         if (c1 != 2 || c2 != 2 || (len & 0x80)) {
+            // not run-length encoded, so we have to actually use THIS data as a decoded
+            // pixel (note this can't be a valid pixel--one of RGB must be >= 128)
+            stbi_uc rgbe[4];
+            rgbe[0] = (stbi_uc) c1;
+            rgbe[1] = (stbi_uc) c2;
+            rgbe[2] = (stbi_uc) len;
+            rgbe[3] = (stbi_uc) stbi__get8(s);
+            stbi__hdr_convert(hdr_data, rgbe, req_comp);
+            i = 1;
+            j = 0;
+            STBI_FREE(scanline);
+            goto main_decode_loop; // yes, this makes no sense
+         }
+         len <<= 8;
+         len |= stbi__get8(s);
+         if (len != width) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("invalid decoded scanline length", "corrupt HDR"); }
+         if (scanline == NULL) {
+            scanline = (stbi_uc *) stbi__malloc_mad2(width, 4, 0);
+            if (!scanline) {
+               STBI_FREE(hdr_data);
+               return stbi__errpf("outofmem", "Out of memory");
+            }
+         }
+
+         for (k = 0; k < 4; ++k) {
+            int nleft;
+            i = 0;
+            while ((nleft = width - i) > 0) {
+               count = stbi__get8(s);
+               if (count > 128) {
+                  // Run
+                  value = stbi__get8(s);
+                  count -= 128;
+                  if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
+                  for (z = 0; z < count; ++z)
+                     scanline[i++ * 4 + k] = value;
+               } else {
+                  // Dump
+                  if (count > nleft) { STBI_FREE(hdr_data); STBI_FREE(scanline); return stbi__errpf("corrupt", "bad RLE data in HDR"); }
+                  for (z = 0; z < count; ++z)
+                     scanline[i++ * 4 + k] = stbi__get8(s);
+               }
+            }
+         }
+         for (i=0; i < width; ++i)
+            stbi__hdr_convert(hdr_data+(j*width + i)*req_comp, scanline + i*4, req_comp);
+      }
+      if (scanline)
+         STBI_FREE(scanline);
+   }
+
+   return hdr_data;
+}
+
+static int stbi__hdr_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   char buffer[STBI__HDR_BUFLEN];
+   char *token;
+   int valid = 0;
+   int dummy;
+
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+
+   if (stbi__hdr_test(s) == 0) {
+       stbi__rewind( s );
+       return 0;
+   }
+
+   for(;;) {
+      token = stbi__hdr_gettoken(s,buffer);
+      if (token[0] == 0) break;
+      if (strcmp(token, "FORMAT=32-bit_rle_rgbe") == 0) valid = 1;
+   }
+
+   if (!valid) {
+       stbi__rewind( s );
+       return 0;
+   }
+   token = stbi__hdr_gettoken(s,buffer);
+   if (strncmp(token, "-Y ", 3)) {
+       stbi__rewind( s );
+       return 0;
+   }
+   token += 3;
+   *y = (int) strtol(token, &token, 10);
+   while (*token == ' ') ++token;
+   if (strncmp(token, "+X ", 3)) {
+       stbi__rewind( s );
+       return 0;
+   }
+   token += 3;
+   *x = (int) strtol(token, NULL, 10);
+   *comp = 3;
+   return 1;
+}
+#endif // STBI_NO_HDR
+
+#ifndef STBI_NO_BMP
+static int stbi__bmp_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   void *p;
+   stbi__bmp_data info;
+
+   info.all_a = 255;
+   p = stbi__bmp_parse_header(s, &info);
+   if (p == NULL) {
+      stbi__rewind( s );
+      return 0;
+   }
+   if (x) *x = s->img_x;
+   if (y) *y = s->img_y;
+   if (comp) {
+      if (info.bpp == 24 && info.ma == 0xff000000)
+         *comp = 3;
+      else
+         *comp = info.ma ? 4 : 3;
+   }
+   return 1;
+}
+#endif
+
+#ifndef STBI_NO_PSD
+static int stbi__psd_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int channelCount, dummy, depth;
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+   if (stbi__get32be(s) != 0x38425053) {
+       stbi__rewind( s );
+       return 0;
+   }
+   if (stbi__get16be(s) != 1) {
+       stbi__rewind( s );
+       return 0;
+   }
+   stbi__skip(s, 6);
+   channelCount = stbi__get16be(s);
+   if (channelCount < 0 || channelCount > 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   *y = stbi__get32be(s);
+   *x = stbi__get32be(s);
+   depth = stbi__get16be(s);
+   if (depth != 8 && depth != 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   if (stbi__get16be(s) != 3) {
+       stbi__rewind( s );
+       return 0;
+   }
+   *comp = 4;
+   return 1;
+}
+
+static int stbi__psd_is16(stbi__context *s)
+{
+   int channelCount, depth;
+   if (stbi__get32be(s) != 0x38425053) {
+       stbi__rewind( s );
+       return 0;
+   }
+   if (stbi__get16be(s) != 1) {
+       stbi__rewind( s );
+       return 0;
+   }
+   stbi__skip(s, 6);
+   channelCount = stbi__get16be(s);
+   if (channelCount < 0 || channelCount > 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   STBI_NOTUSED(stbi__get32be(s));
+   STBI_NOTUSED(stbi__get32be(s));
+   depth = stbi__get16be(s);
+   if (depth != 16) {
+       stbi__rewind( s );
+       return 0;
+   }
+   return 1;
+}
+#endif
+
+#ifndef STBI_NO_PIC
+static int stbi__pic_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int act_comp=0,num_packets=0,chained,dummy;
+   stbi__pic_packet packets[10];
+
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+
+   if (!stbi__pic_is4(s,"\x53\x80\xF6\x34")) {
+      stbi__rewind(s);
+      return 0;
+   }
+
+   stbi__skip(s, 88);
+
+   *x = stbi__get16be(s);
+   *y = stbi__get16be(s);
+   if (stbi__at_eof(s)) {
+      stbi__rewind( s);
+      return 0;
+   }
+   if ( (*x) != 0 && (1 << 28) / (*x) < (*y)) {
+      stbi__rewind( s );
+      return 0;
+   }
+
+   stbi__skip(s, 8);
+
+   do {
+      stbi__pic_packet *packet;
+
+      if (num_packets==sizeof(packets)/sizeof(packets[0]))
+         return 0;
+
+      packet = &packets[num_packets++];
+      chained = stbi__get8(s);
+      packet->size    = stbi__get8(s);
+      packet->type    = stbi__get8(s);
+      packet->channel = stbi__get8(s);
+      act_comp |= packet->channel;
+
+      if (stbi__at_eof(s)) {
+          stbi__rewind( s );
+          return 0;
+      }
+      if (packet->size != 8) {
+          stbi__rewind( s );
+          return 0;
+      }
+   } while (chained);
+
+   *comp = (act_comp & 0x10 ? 4 : 3);
+
+   return 1;
+}
+#endif
+
+// *************************************************************************************************
+// Portable Gray Map and Portable Pixel Map loader
+// by Ken Miller
+//
+// PGM: http://netpbm.sourceforge.net/doc/pgm.html
+// PPM: http://netpbm.sourceforge.net/doc/ppm.html
+//
+// Known limitations:
+//    Does not support comments in the header section
+//    Does not support ASCII image data (formats P2 and P3)
+
+#ifndef STBI_NO_PNM
+
+static int      stbi__pnm_test(stbi__context *s)
+{
+   char p, t;
+   p = (char) stbi__get8(s);
+   t = (char) stbi__get8(s);
+   if (p != 'P' || (t != '5' && t != '6')) {
+       stbi__rewind( s );
+       return 0;
+   }
+   return 1;
+}
+
+static void *stbi__pnm_load(stbi__context *s, int *x, int *y, int *comp, int req_comp, stbi__result_info *ri)
+{
+   stbi_uc *out;
+   STBI_NOTUSED(ri);
+
+   ri->bits_per_channel = stbi__pnm_info(s, (int *)&s->img_x, (int *)&s->img_y, (int *)&s->img_n);
+   if (ri->bits_per_channel == 0)
+      return 0;
+
+   if (s->img_y > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+   if (s->img_x > STBI_MAX_DIMENSIONS) return stbi__errpuc("too large","Very large image (corrupt?)");
+
+   *x = s->img_x;
+   *y = s->img_y;
+   if (comp) *comp = s->img_n;
+
+   if (!stbi__mad4sizes_valid(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0))
+      return stbi__errpuc("too large", "PNM too large");
+
+   out = (stbi_uc *) stbi__malloc_mad4(s->img_n, s->img_x, s->img_y, ri->bits_per_channel / 8, 0);
+   if (!out) return stbi__errpuc("outofmem", "Out of memory");
+   stbi__getn(s, out, s->img_n * s->img_x * s->img_y * (ri->bits_per_channel / 8));
+
+   if (req_comp && req_comp != s->img_n) {
+      out = stbi__convert_format(out, s->img_n, req_comp, s->img_x, s->img_y);
+      if (out == NULL) return out; // stbi__convert_format frees input on failure
+   }
+   return out;
+}
+
+static int      stbi__pnm_isspace(char c)
+{
+   return c == ' ' || c == '\t' || c == '\n' || c == '\v' || c == '\f' || c == '\r';
+}
+
+static void     stbi__pnm_skip_whitespace(stbi__context *s, char *c)
+{
+   for (;;) {
+      while (!stbi__at_eof(s) && stbi__pnm_isspace(*c))
+         *c = (char) stbi__get8(s);
+
+      if (stbi__at_eof(s) || *c != '#')
+         break;
+
+      while (!stbi__at_eof(s) && *c != '\n' && *c != '\r' )
+         *c = (char) stbi__get8(s);
+   }
+}
+
+static int      stbi__pnm_isdigit(char c)
+{
+   return c >= '0' && c <= '9';
+}
+
+static int      stbi__pnm_getinteger(stbi__context *s, char *c)
+{
+   int value = 0;
+
+   while (!stbi__at_eof(s) && stbi__pnm_isdigit(*c)) {
+      value = value*10 + (*c - '0');
+      *c = (char) stbi__get8(s);
+   }
+
+   return value;
+}
+
+static int      stbi__pnm_info(stbi__context *s, int *x, int *y, int *comp)
+{
+   int maxv, dummy;
+   char c, p, t;
+
+   if (!x) x = &dummy;
+   if (!y) y = &dummy;
+   if (!comp) comp = &dummy;
+
+   stbi__rewind(s);
+
+   // Get identifier
+   p = (char) stbi__get8(s);
+   t = (char) stbi__get8(s);
+   if (p != 'P' || (t != '5' && t != '6')) {
+       stbi__rewind(s);
+       return 0;
+   }
+
+   *comp = (t == '6') ? 3 : 1;  // '5' is 1-component .pgm; '6' is 3-component .ppm
+
+   c = (char) stbi__get8(s);
+   stbi__pnm_skip_whitespace(s, &c);
+
+   *x = stbi__pnm_getinteger(s, &c); // read width
+   stbi__pnm_skip_whitespace(s, &c);
+
+   *y = stbi__pnm_getinteger(s, &c); // read height
+   stbi__pnm_skip_whitespace(s, &c);
+
+   maxv = stbi__pnm_getinteger(s, &c);  // read max value
+   if (maxv > 65535)
+      return stbi__err("max value > 65535", "PPM image supports only 8-bit and 16-bit images");
+   else if (maxv > 255)
+      return 16;
+   else
+      return 8;
+}
+
+static int stbi__pnm_is16(stbi__context *s)
+{
+   if (stbi__pnm_info(s, NULL, NULL, NULL) == 16)
+	   return 1;
+   return 0;
+}
+#endif
+
+static int stbi__info_main(stbi__context *s, int *x, int *y, int *comp)
+{
+   #ifndef STBI_NO_JPEG
+   if (stbi__jpeg_info(s, x, y, comp)) return 1;
+   #endif
+
+   #ifndef STBI_NO_PNG
+   if (stbi__png_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_GIF
+   if (stbi__gif_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_BMP
+   if (stbi__bmp_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PSD
+   if (stbi__psd_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PIC
+   if (stbi__pic_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PNM
+   if (stbi__pnm_info(s, x, y, comp))  return 1;
+   #endif
+
+   #ifndef STBI_NO_HDR
+   if (stbi__hdr_info(s, x, y, comp))  return 1;
+   #endif
+
+   // test tga last because it's a crappy test!
+   #ifndef STBI_NO_TGA
+   if (stbi__tga_info(s, x, y, comp))
+       return 1;
+   #endif
+   return stbi__err("unknown image type", "Image not of any known type, or corrupt");
+}
+
+static int stbi__is_16_main(stbi__context *s)
+{
+   #ifndef STBI_NO_PNG
+   if (stbi__png_is16(s))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PSD
+   if (stbi__psd_is16(s))  return 1;
+   #endif
+
+   #ifndef STBI_NO_PNM
+   if (stbi__pnm_is16(s))  return 1;
+   #endif
+   return 0;
+}
+
+#ifndef STBI_NO_STDIO
+STBIDEF int stbi_info(char const *filename, int *x, int *y, int *comp)
+{
+    FILE *f = stbi__fopen(filename, "rb");
+    int result;
+    if (!f) return stbi__err("can't fopen", "Unable to open file");
+    result = stbi_info_from_file(f, x, y, comp);
+    fclose(f);
+    return result;
+}
+
+STBIDEF int stbi_info_from_file(FILE *f, int *x, int *y, int *comp)
+{
+   int r;
+   stbi__context s;
+   long pos = ftell(f);
+   stbi__start_file(&s, f);
+   r = stbi__info_main(&s,x,y,comp);
+   fseek(f,pos,SEEK_SET);
+   return r;
+}
+
+STBIDEF int stbi_is_16_bit(char const *filename)
+{
+    FILE *f = stbi__fopen(filename, "rb");
+    int result;
+    if (!f) return stbi__err("can't fopen", "Unable to open file");
+    result = stbi_is_16_bit_from_file(f);
+    fclose(f);
+    return result;
+}
+
+STBIDEF int stbi_is_16_bit_from_file(FILE *f)
+{
+   int r;
+   stbi__context s;
+   long pos = ftell(f);
+   stbi__start_file(&s, f);
+   r = stbi__is_16_main(&s);
+   fseek(f,pos,SEEK_SET);
+   return r;
+}
+#endif // !STBI_NO_STDIO
+
+STBIDEF int stbi_info_from_memory(stbi_uc const *buffer, int len, int *x, int *y, int *comp)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__info_main(&s,x,y,comp);
+}
+
+STBIDEF int stbi_info_from_callbacks(stbi_io_callbacks const *c, void *user, int *x, int *y, int *comp)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
+   return stbi__info_main(&s,x,y,comp);
+}
+
+STBIDEF int stbi_is_16_bit_from_memory(stbi_uc const *buffer, int len)
+{
+   stbi__context s;
+   stbi__start_mem(&s,buffer,len);
+   return stbi__is_16_main(&s);
+}
+
+STBIDEF int stbi_is_16_bit_from_callbacks(stbi_io_callbacks const *c, void *user)
+{
+   stbi__context s;
+   stbi__start_callbacks(&s, (stbi_io_callbacks *) c, user);
+   return stbi__is_16_main(&s);
+}
+
+#endif // STB_IMAGE_IMPLEMENTATION
+
+/*
+   revision history:
+      2.20  (2019-02-07) support utf8 filenames in Windows; fix warnings and platform ifdefs
+      2.19  (2018-02-11) fix warning
+      2.18  (2018-01-30) fix warnings
+      2.17  (2018-01-29) change sbti__shiftsigned to avoid clang -O2 bug
+                         1-bit BMP
+                         *_is_16_bit api
+                         avoid warnings
+      2.16  (2017-07-23) all functions have 16-bit variants;
+                         STBI_NO_STDIO works again;
+                         compilation fixes;
+                         fix rounding in unpremultiply;
+                         optimize vertical flip;
+                         disable raw_len validation;
+                         documentation fixes
+      2.15  (2017-03-18) fix png-1,2,4 bug; now all Imagenet JPGs decode;
+                         warning fixes; disable run-time SSE detection on gcc;
+                         uniform handling of optional "return" values;
+                         thread-safe initialization of zlib tables
+      2.14  (2017-03-03) remove deprecated STBI_JPEG_OLD; fixes for Imagenet JPGs
+      2.13  (2016-11-29) add 16-bit API, only supported for PNG right now
+      2.12  (2016-04-02) fix typo in 2.11 PSD fix that caused crashes
+      2.11  (2016-04-02) allocate large structures on the stack
+                         remove white matting for transparent PSD
+                         fix reported channel count for PNG & BMP
+                         re-enable SSE2 in non-gcc 64-bit
+                         support RGB-formatted JPEG
+                         read 16-bit PNGs (only as 8-bit)
+      2.10  (2016-01-22) avoid warning introduced in 2.09 by STBI_REALLOC_SIZED
+      2.09  (2016-01-16) allow comments in PNM files
+                         16-bit-per-pixel TGA (not bit-per-component)
+                         info() for TGA could break due to .hdr handling
+                         info() for BMP to shares code instead of sloppy parse
+                         can use STBI_REALLOC_SIZED if allocator doesn't support realloc
+                         code cleanup
+      2.08  (2015-09-13) fix to 2.07 cleanup, reading RGB PSD as RGBA
+      2.07  (2015-09-13) fix compiler warnings
+                         partial animated GIF support
+                         limited 16-bpc PSD support
+                         #ifdef unused functions
+                         bug with < 92 byte PIC,PNM,HDR,TGA
+      2.06  (2015-04-19) fix bug where PSD returns wrong '*comp' value
+      2.05  (2015-04-19) fix bug in progressive JPEG handling, fix warning
+      2.04  (2015-04-15) try to re-enable SIMD on MinGW 64-bit
+      2.03  (2015-04-12) extra corruption checking (mmozeiko)
+                         stbi_set_flip_vertically_on_load (nguillemot)
+                         fix NEON support; fix mingw support
+      2.02  (2015-01-19) fix incorrect assert, fix warning
+      2.01  (2015-01-17) fix various warnings; suppress SIMD on gcc 32-bit without -msse2
+      2.00b (2014-12-25) fix STBI_MALLOC in progressive JPEG
+      2.00  (2014-12-25) optimize JPG, including x86 SSE2 & NEON SIMD (ryg)
+                         progressive JPEG (stb)
+                         PGM/PPM support (Ken Miller)
+                         STBI_MALLOC,STBI_REALLOC,STBI_FREE
+                         GIF bugfix -- seemingly never worked
+                         STBI_NO_*, STBI_ONLY_*
+      1.48  (2014-12-14) fix incorrectly-named assert()
+      1.47  (2014-12-14) 1/2/4-bit PNG support, both direct and paletted (Omar Cornut & stb)
+                         optimize PNG (ryg)
+                         fix bug in interlaced PNG with user-specified channel count (stb)
+      1.46  (2014-08-26)
+              fix broken tRNS chunk (colorkey-style transparency) in non-paletted PNG
+      1.45  (2014-08-16)
+              fix MSVC-ARM internal compiler error by wrapping malloc
+      1.44  (2014-08-07)
+              various warning fixes from Ronny Chevalier
+      1.43  (2014-07-15)
+              fix MSVC-only compiler problem in code changed in 1.42
+      1.42  (2014-07-09)
+              don't define _CRT_SECURE_NO_WARNINGS (affects user code)
+              fixes to stbi__cleanup_jpeg path
+              added STBI_ASSERT to avoid requiring assert.h
+      1.41  (2014-06-25)
+              fix search&replace from 1.36 that messed up comments/error messages
+      1.40  (2014-06-22)
+              fix gcc struct-initialization warning
+      1.39  (2014-06-15)
+              fix to TGA optimization when req_comp != number of components in TGA;
+              fix to GIF loading because BMP wasn't rewinding (whoops, no GIFs in my test suite)
+              add support for BMP version 5 (more ignored fields)
+      1.38  (2014-06-06)
+              suppress MSVC warnings on integer casts truncating values
+              fix accidental rename of 'skip' field of I/O
+      1.37  (2014-06-04)
+              remove duplicate typedef
+      1.36  (2014-06-03)
+              convert to header file single-file library
+              if de-iphone isn't set, load iphone images color-swapped instead of returning NULL
+      1.35  (2014-05-27)
+              various warnings
+              fix broken STBI_SIMD path
+              fix bug where stbi_load_from_file no longer left file pointer in correct place
+              fix broken non-easy path for 32-bit BMP (possibly never used)
+              TGA optimization by Arseny Kapoulkine
+      1.34  (unknown)
+              use STBI_NOTUSED in stbi__resample_row_generic(), fix one more leak in tga failure case
+      1.33  (2011-07-14)
+              make stbi_is_hdr work in STBI_NO_HDR (as specified), minor compiler-friendly improvements
+      1.32  (2011-07-13)
+              support for "info" function for all supported filetypes (SpartanJ)
+      1.31  (2011-06-20)
+              a few more leak fixes, bug in PNG handling (SpartanJ)
+      1.30  (2011-06-11)
+              added ability to load files via callbacks to accomidate custom input streams (Ben Wenger)
+              removed deprecated format-specific test/load functions
+              removed support for installable file formats (stbi_loader) -- would have been broken for IO callbacks anyway
+              error cases in bmp and tga give messages and don't leak (Raymond Barbiero, grisha)
+              fix inefficiency in decoding 32-bit BMP (David Woo)
+      1.29  (2010-08-16)
+              various warning fixes from Aurelien Pocheville
+      1.28  (2010-08-01)
+              fix bug in GIF palette transparency (SpartanJ)
+      1.27  (2010-08-01)
+              cast-to-stbi_uc to fix warnings
+      1.26  (2010-07-24)
+              fix bug in file buffering for PNG reported by SpartanJ
+      1.25  (2010-07-17)
+              refix trans_data warning (Won Chun)
+      1.24  (2010-07-12)
+              perf improvements reading from files on platforms with lock-heavy fgetc()
+              minor perf improvements for jpeg
+              deprecated type-specific functions so we'll get feedback if they're needed
+              attempt to fix trans_data warning (Won Chun)
+      1.23    fixed bug in iPhone support
+      1.22  (2010-07-10)
+              removed image *writing* support
+              stbi_info support from Jetro Lauha
+              GIF support from Jean-Marc Lienher
+              iPhone PNG-extensions from James Brown
+              warning-fixes from Nicolas Schulz and Janez Zemva (i.stbi__err. Janez (U+017D)emva)
+      1.21    fix use of 'stbi_uc' in header (reported by jon blow)
+      1.20    added support for Softimage PIC, by Tom Seddon
+      1.19    bug in interlaced PNG corruption check (found by ryg)
+      1.18  (2008-08-02)
+              fix a threading bug (local mutable static)
+      1.17    support interlaced PNG
+      1.16    major bugfix - stbi__convert_format converted one too many pixels
+      1.15    initialize some fields for thread safety
+      1.14    fix threadsafe conversion bug
+              header-file-only version (#define STBI_HEADER_FILE_ONLY before including)
+      1.13    threadsafe
+      1.12    const qualifiers in the API
+      1.11    Support installable IDCT, colorspace conversion routines
+      1.10    Fixes for 64-bit (don't use "unsigned long")
+              optimized upsampling by Fabian "ryg" Giesen
+      1.09    Fix format-conversion for PSD code (bad global variables!)
+      1.08    Thatcher Ulrich's PSD code integrated by Nicolas Schulz
+      1.07    attempt to fix C++ warning/errors again
+      1.06    attempt to fix C++ warning/errors again
+      1.05    fix TGA loading to return correct *comp and use good luminance calc
+      1.04    default float alpha is 1, not 255; use 'void *' for stbi_image_free
+      1.03    bugfixes to STBI_NO_STDIO, STBI_NO_HDR
+      1.02    support for (subset of) HDR files, float interface for preferred access to them
+      1.01    fix bug: possible bug in handling right-side up bmps... not sure
+              fix bug: the stbi__bmp_load() and stbi__tga_load() functions didn't work at all
+      1.00    interface to zlib that skips zlib header
+      0.99    correct handling of alpha in palette
+      0.98    TGA loader by lonesock; dynamically add loaders (untested)
+      0.97    jpeg errors on too large a file; also catch another malloc failure
+      0.96    fix detection of invalid v value - particleman@mollyrocket forum
+      0.95    during header scan, seek to markers in case of padding
+      0.94    STBI_NO_STDIO to disable stdio usage; rename all #defines the same
+      0.93    handle jpegtran output; verbose errors
+      0.92    read 4,8,16,24,32-bit BMP files of several formats
+      0.91    output 24-bit Windows 3.0 BMP files
+      0.90    fix a few more warnings; bump version number to approach 1.0
+      0.61    bugfixes due to Marc LeBlanc, Christopher Lloyd
+      0.60    fix compiling as c++
+      0.59    fix warnings: merge Dave Moore's -Wall fixes
+      0.58    fix bug: zlib uncompressed mode len/nlen was wrong endian
+      0.57    fix bug: jpg last huffman symbol before marker was >9 bits but less than 16 available
+      0.56    fix bug: zlib uncompressed mode len vs. nlen
+      0.55    fix bug: restart_interval not initialized to 0
+      0.54    allow NULL for 'int *comp'
+      0.53    fix bug in png 3->4; speedup png decoding
+      0.52    png handles req_comp=3,4 directly; minor cleanup; jpeg comments
+      0.51    obey req_comp requests, 1-component jpegs return as 1-component,
+              on 'test' only check type, not whether we support this variant
+      0.50  (2006-11-19)
+              first released version
+*/
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/neo/renderer/tr_backend.cpp b/neo/renderer/tr_backend.cpp
index 18224a0..cd1fb7c 100644
--- a/neo/renderer/tr_backend.cpp
+++ b/neo/renderer/tr_backend.cpp
@@ -29,6 +29,8 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "renderer/tr_local.h"
 
+static idCVar r_fillWindowAlphaChan( "r_fillWindowAlphaChan", "-1", CVAR_SYSTEM | CVAR_NOCHEAT | CVAR_ARCHIVE, "Make sure alpha channel of windows default framebuffer is completely opaque at the end of each frame. Needed at least when using Wayland.\n 1: do this, 0: don't do it, -1: let dhewm3 decide (default)" );
+
 frameData_t		*frameData;
 backEndState_t	backEnd;
 
@@ -529,6 +531,72 @@ const void	RB_SwapBuffers( const void *data ) {
 		RB_ShowImages();
 	}
 
+	int fillAlpha = r_fillWindowAlphaChan.GetInteger();
+	if ( fillAlpha == 1 || (fillAlpha == -1 && glConfig.shouldFillWindowAlpha) )
+	{
+		// make sure the whole alpha chan of the (default) framebuffer is opaque.
+		// at least Wayland needs this, see also the big comment in GLimp_Init()
+
+		bool blendEnabled = qglIsEnabled( GL_BLEND );
+		if ( !blendEnabled )
+			qglEnable( GL_BLEND );
+
+		// TODO: GL_DEPTH_TEST ? (should be disabled, if it needs changing at all)
+
+		bool scissorEnabled = qglIsEnabled( GL_SCISSOR_TEST );
+		if( scissorEnabled )
+			qglDisable( GL_SCISSOR_TEST );
+
+		bool tex2Denabled = qglIsEnabled( GL_TEXTURE_2D );
+		if( tex2Denabled )
+			qglDisable( GL_TEXTURE_2D );
+
+		qglDisable( GL_VERTEX_PROGRAM_ARB );
+		qglDisable( GL_FRAGMENT_PROGRAM_ARB );
+
+		qglBlendEquation( GL_FUNC_ADD );
+
+		qglBlendFunc( GL_ONE, GL_ONE );
+
+		// setup transform matrices so we can easily/reliably draw a fullscreen quad
+		qglMatrixMode( GL_MODELVIEW );
+		qglPushMatrix();
+		qglLoadIdentity();
+
+		qglMatrixMode( GL_PROJECTION );
+		qglPushMatrix();
+		qglLoadIdentity();
+		qglOrtho( 0, 1, 0, 1, -1, 1 );
+
+		// draw screen-sized quad with color (0.0, 0.0, 0.0, 1.0)
+		const float x=0, y=0, w=1, h=1;
+		qglColor4f( 0.0f, 0.0f, 0.0f, 1.0f );
+		// debug values:
+		//const float x = 0.1, y = 0.1, w = 0.8, h = 0.8;
+		//qglColor4f( 0.0f, 0.0f, 0.5f, 1.0f );
+
+		qglBegin( GL_QUADS );
+			qglVertex2f( x,   y   ); // ( 0,0 );
+			qglVertex2f( x,   y+h ); // ( 0,1 );
+			qglVertex2f( x+w, y+h ); // ( 1,1 );
+			qglVertex2f( x+w, y   ); // ( 1,0 );
+		qglEnd();
+
+		// restore previous transform matrix states
+		qglPopMatrix(); // for projection
+		qglMatrixMode( GL_MODELVIEW );
+		qglPopMatrix(); // for modelview
+
+		// restore default or previous states
+		qglBlendEquation( GL_FUNC_ADD );
+		if ( !blendEnabled )
+			qglDisable( GL_BLEND );
+		if( tex2Denabled )
+			qglEnable( GL_TEXTURE_2D );
+		if( scissorEnabled )
+			qglEnable( GL_SCISSOR_TEST );
+	}
+
 	// force a gl sync if requested
 	if ( r_finish.GetBool() ) {
 		qglFinish();
diff --git a/neo/renderer/tr_local.h b/neo/renderer/tr_local.h
index f2a4d65..2ace628 100644
--- a/neo/renderer/tr_local.h
+++ b/neo/renderer/tr_local.h
@@ -548,7 +548,6 @@ extern	frameData_t	*frameData;
 
 //=======================================================================
 
-void R_LockSurfaceScene( viewDef_t *parms );
 void R_ClearCommandChain( void );
 void R_AddDrawViewCmd( viewDef_t *parms );
 
@@ -785,6 +784,9 @@ public:
 	performanceCounters_t	pc;					// performance counters
 
 	drawSurfsCommand_t		lockSurfacesCmd;	// use this when r_lockSurfaces = 1
+	//renderView_t			lockSurfacesRenderView;
+	viewDef_t				lockSurfacesViewDef; // of locked position/view
+	viewDef_t				lockSurfacesRealViewDef; // of actual player position
 
 	viewEntity_t			identitySpace;		// can use if we don't know viewDef->worldSpace is valid
 	int						stencilIncr, stencilDecr;	// GL_INCR / INCR_WRAP_EXT, GL_DECR / GL_DECR_EXT
@@ -835,6 +837,7 @@ extern idCVar r_flareSize;				// scale the flare deforms from the material def
 
 extern idCVar r_gamma;					// changes gamma tables
 extern idCVar r_brightness;				// changes gamma tables
+extern idCVar r_gammaInShader;			// set gamma+brightness in shader instead of modifying system gamma tables
 
 extern idCVar r_renderer;				// arb2, etc
 
@@ -1090,6 +1093,8 @@ void		GLimp_SetGamma( unsigned short red[256],
 // These are now taken as 16 bit values, so we can take full advantage
 // of dacs with >8 bits of precision
 
+void		GLimp_ResetGamma();
+// resets the gamma to what it was at startup
 
 // Returns false if the system only has a single processor
 
@@ -1103,10 +1108,9 @@ void		GLimp_DeactivateContext( void );
 // being immediate returns, which lets us guage how much time is
 // being spent inside OpenGL.
 
-const int GRAB_ENABLE		= (1 << 0);
-const int GRAB_REENABLE		= (1 << 1);
-const int GRAB_HIDECURSOR	= (1 << 2);
-const int GRAB_SETSTATE		= (1 << 3);
+const int GRAB_GRABMOUSE	= (1 << 0);
+const int GRAB_HIDECURSOR	= (1 << 1);
+const int GRAB_RELATIVEMOUSE = (1 << 2);
 
 void GLimp_GrabInput(int flags);
 /*
diff --git a/neo/renderer/tr_main.cpp b/neo/renderer/tr_main.cpp
index 3f3194e..3fb5c02 100644
--- a/neo/renderer/tr_main.cpp
+++ b/neo/renderer/tr_main.cpp
@@ -184,9 +184,6 @@ R_ToggleSmpFrame
 ====================
 */
 void R_ToggleSmpFrame( void ) {
-	if ( r_lockSurfaces.GetBool() ) {
-		return;
-	}
 	R_FreeDeferredTriSurfs( frameData );
 
 	// clear frame-temporary data
@@ -894,7 +891,7 @@ R_SetupProjection
 This uses the "infinite far z" trick
 ===============
 */
-void R_SetupProjection( void ) {
+void R_SetupProjection( viewDef_t * viewDef ) {
 	float	xmin, xmax, ymin, ymax;
 	float	width, height;
 	float	zNear;
@@ -915,48 +912,48 @@ void R_SetupProjection( void ) {
 	// set up projection matrix
 	//
 	zNear	= r_znear.GetFloat();
-	if ( tr.viewDef->renderView.cramZNear ) {
+	if ( viewDef->renderView.cramZNear ) {
 		zNear *= 0.25;
 	}
 
-	ymax = zNear * tan( tr.viewDef->renderView.fov_y * idMath::PI / 360.0f );
+	ymax = zNear * tan( viewDef->renderView.fov_y * idMath::PI / 360.0f );
 	ymin = -ymax;
 
-	xmax = zNear * tan( tr.viewDef->renderView.fov_x * idMath::PI / 360.0f );
+	xmax = zNear * tan( viewDef->renderView.fov_x * idMath::PI / 360.0f );
 	xmin = -xmax;
 
 	width = xmax - xmin;
 	height = ymax - ymin;
 
-	jitterx = jitterx * width / ( tr.viewDef->viewport.x2 - tr.viewDef->viewport.x1 + 1 );
+	jitterx = jitterx * width / ( viewDef->viewport.x2 - viewDef->viewport.x1 + 1 );
 	xmin += jitterx;
 	xmax += jitterx;
-	jittery = jittery * height / ( tr.viewDef->viewport.y2 - tr.viewDef->viewport.y1 + 1 );
+	jittery = jittery * height / ( viewDef->viewport.y2 - viewDef->viewport.y1 + 1 );
 	ymin += jittery;
 	ymax += jittery;
 
-	tr.viewDef->projectionMatrix[0] = 2 * zNear / width;
-	tr.viewDef->projectionMatrix[4] = 0;
-	tr.viewDef->projectionMatrix[8] = ( xmax + xmin ) / width;	// normally 0
-	tr.viewDef->projectionMatrix[12] = 0;
+	viewDef->projectionMatrix[0] = 2 * zNear / width;
+	viewDef->projectionMatrix[4] = 0;
+	viewDef->projectionMatrix[8] = ( xmax + xmin ) / width;	// normally 0
+	viewDef->projectionMatrix[12] = 0;
 
-	tr.viewDef->projectionMatrix[1] = 0;
-	tr.viewDef->projectionMatrix[5] = 2 * zNear / height;
-	tr.viewDef->projectionMatrix[9] = ( ymax + ymin ) / height;	// normally 0
-	tr.viewDef->projectionMatrix[13] = 0;
+	viewDef->projectionMatrix[1] = 0;
+	viewDef->projectionMatrix[5] = 2 * zNear / height;
+	viewDef->projectionMatrix[9] = ( ymax + ymin ) / height;	// normally 0
+	viewDef->projectionMatrix[13] = 0;
 
 	// this is the far-plane-at-infinity formulation, and
 	// crunches the Z range slightly so w=0 vertexes do not
 	// rasterize right at the wraparound point
-	tr.viewDef->projectionMatrix[2] = 0;
-	tr.viewDef->projectionMatrix[6] = 0;
-	tr.viewDef->projectionMatrix[10] = -0.999f;
-	tr.viewDef->projectionMatrix[14] = -2.0f * zNear;
-
-	tr.viewDef->projectionMatrix[3] = 0;
-	tr.viewDef->projectionMatrix[7] = 0;
-	tr.viewDef->projectionMatrix[11] = -1;
-	tr.viewDef->projectionMatrix[15] = 0;
+	viewDef->projectionMatrix[2] = 0;
+	viewDef->projectionMatrix[6] = 0;
+	viewDef->projectionMatrix[10] = -0.999f;
+	viewDef->projectionMatrix[14] = -2.0f * zNear;
+
+	viewDef->projectionMatrix[3] = 0;
+	viewDef->projectionMatrix[7] = 0;
+	viewDef->projectionMatrix[11] = -1;
+	viewDef->projectionMatrix[15] = 0;
 }
 
 /*
@@ -967,30 +964,31 @@ Setup that culling frustum planes for the current view
 FIXME: derive from modelview matrix times projection matrix
 =================
 */
-static void R_SetupViewFrustum( void ) {
+//static
+void R_SetupViewFrustum( viewDef_t* viewDef ) {
 	int		i;
 	float	xs, xc;
 	float	ang;
 
-	ang = DEG2RAD( tr.viewDef->renderView.fov_x ) * 0.5f;
+	ang = DEG2RAD( viewDef->renderView.fov_x ) * 0.5f;
 	idMath::SinCos( ang, xs, xc );
 
-	tr.viewDef->frustum[0] = xs * tr.viewDef->renderView.viewaxis[0] + xc * tr.viewDef->renderView.viewaxis[1];
-	tr.viewDef->frustum[1] = xs * tr.viewDef->renderView.viewaxis[0] - xc * tr.viewDef->renderView.viewaxis[1];
+	viewDef->frustum[0] = xs * viewDef->renderView.viewaxis[0] + xc * viewDef->renderView.viewaxis[1];
+	viewDef->frustum[1] = xs * viewDef->renderView.viewaxis[0] - xc * viewDef->renderView.viewaxis[1];
 
-	ang = DEG2RAD( tr.viewDef->renderView.fov_y ) * 0.5f;
+	ang = DEG2RAD( viewDef->renderView.fov_y ) * 0.5f;
 	idMath::SinCos( ang, xs, xc );
 
-	tr.viewDef->frustum[2] = xs * tr.viewDef->renderView.viewaxis[0] + xc * tr.viewDef->renderView.viewaxis[2];
-	tr.viewDef->frustum[3] = xs * tr.viewDef->renderView.viewaxis[0] - xc * tr.viewDef->renderView.viewaxis[2];
+	viewDef->frustum[2] = xs * viewDef->renderView.viewaxis[0] + xc * viewDef->renderView.viewaxis[2];
+	viewDef->frustum[3] = xs * viewDef->renderView.viewaxis[0] - xc * viewDef->renderView.viewaxis[2];
 
 	// plane four is the front clipping plane
-	tr.viewDef->frustum[4] = /* vec3_origin - */ tr.viewDef->renderView.viewaxis[0];
+	viewDef->frustum[4] = /* vec3_origin - */ viewDef->renderView.viewaxis[0];
 
 	for ( i = 0; i < 5; i++ ) {
 		// flip direction so positive side faces out (FIXME: globally unify this)
-		tr.viewDef->frustum[i] = -tr.viewDef->frustum[i].Normal();
-		tr.viewDef->frustum[i][3] = -( tr.viewDef->renderView.vieworg * tr.viewDef->frustum[i].Normal() );
+		viewDef->frustum[i] = -viewDef->frustum[i].Normal();
+		viewDef->frustum[i][3] = -( viewDef->renderView.vieworg * viewDef->frustum[i].Normal() );
 	}
 
 	// eventually, plane five will be the rear clipping plane for fog
@@ -998,16 +996,16 @@ static void R_SetupViewFrustum( void ) {
 	float dNear, dFar, dLeft, dUp;
 
 	dNear = r_znear.GetFloat();
-	if ( tr.viewDef->renderView.cramZNear ) {
+	if ( viewDef->renderView.cramZNear ) {
 		dNear *= 0.25f;
 	}
 
 	dFar = MAX_WORLD_SIZE;
-	dLeft = dFar * tan( DEG2RAD( tr.viewDef->renderView.fov_x * 0.5f ) );
-	dUp = dFar * tan( DEG2RAD( tr.viewDef->renderView.fov_y * 0.5f ) );
-	tr.viewDef->viewFrustum.SetOrigin( tr.viewDef->renderView.vieworg );
-	tr.viewDef->viewFrustum.SetAxis( tr.viewDef->renderView.viewaxis );
-	tr.viewDef->viewFrustum.SetSize( dNear, dFar, dLeft, dUp );
+	dLeft = dFar * tan( DEG2RAD( viewDef->renderView.fov_x * 0.5f ) );
+	dUp = dFar * tan( DEG2RAD( viewDef->renderView.fov_y * 0.5f ) );
+	viewDef->viewFrustum.SetOrigin( viewDef->renderView.vieworg );
+	viewDef->viewFrustum.SetAxis( viewDef->renderView.viewaxis );
+	viewDef->viewFrustum.SetSize( dNear, dFar, dLeft, dUp );
 }
 
 /*
@@ -1115,11 +1113,11 @@ void R_RenderView( viewDef_t *parms ) {
 
 	// the four sides of the view frustum are needed
 	// for culling and portal visibility
-	R_SetupViewFrustum();
+	R_SetupViewFrustum( tr.viewDef );
 
 	// we need to set the projection matrix before doing
 	// portal-to-screen scissor box calculations
-	R_SetupProjection();
+	R_SetupProjection( tr.viewDef );
 
 	// identify all the visible portalAreas, and the entityDefs and
 	// lightDefs that are in them and pass culling.
diff --git a/neo/renderer/tr_render.cpp b/neo/renderer/tr_render.cpp
index 92cbca0..2b7e10f 100644
--- a/neo/renderer/tr_render.cpp
+++ b/neo/renderer/tr_render.cpp
@@ -554,23 +554,26 @@ to actually render the visible surfaces for this view
 =================
 */
 void RB_BeginDrawingView (void) {
+
+	const viewDef_t* viewDef = backEnd.viewDef;
+
 	// set the modelview matrix for the viewer
 	qglMatrixMode(GL_PROJECTION);
-	qglLoadMatrixf( backEnd.viewDef->projectionMatrix );
+	qglLoadMatrixf( viewDef->projectionMatrix );
 	qglMatrixMode(GL_MODELVIEW);
 
 	// set the window clipping
-	qglViewport( tr.viewportOffset[0] + backEnd.viewDef->viewport.x1,
-		tr.viewportOffset[1] + backEnd.viewDef->viewport.y1,
-		backEnd.viewDef->viewport.x2 + 1 - backEnd.viewDef->viewport.x1,
-		backEnd.viewDef->viewport.y2 + 1 - backEnd.viewDef->viewport.y1 );
+	qglViewport( tr.viewportOffset[0] + viewDef->viewport.x1,
+		tr.viewportOffset[1] + viewDef->viewport.y1,
+		viewDef->viewport.x2 + 1 - viewDef->viewport.x1,
+		viewDef->viewport.y2 + 1 - viewDef->viewport.y1 );
 
 	// the scissor may be smaller than the viewport for subviews
-	qglScissor( tr.viewportOffset[0] + backEnd.viewDef->viewport.x1 + backEnd.viewDef->scissor.x1,
-		tr.viewportOffset[1] + backEnd.viewDef->viewport.y1 + backEnd.viewDef->scissor.y1,
-		backEnd.viewDef->scissor.x2 + 1 - backEnd.viewDef->scissor.x1,
-		backEnd.viewDef->scissor.y2 + 1 - backEnd.viewDef->scissor.y1 );
-	backEnd.currentScissor = backEnd.viewDef->scissor;
+	qglScissor( tr.viewportOffset[0] + viewDef->viewport.x1 + viewDef->scissor.x1,
+		tr.viewportOffset[1] + viewDef->viewport.y1 + viewDef->scissor.y1,
+		viewDef->scissor.x2 + 1 - viewDef->scissor.x1,
+		viewDef->scissor.y2 + 1 - viewDef->scissor.y1 );
+	backEnd.currentScissor = viewDef->scissor;
 
 	// ensures that depth writes are enabled for the depth clear
 	GL_State( GLS_DEFAULT );
@@ -694,6 +697,10 @@ void RB_CreateSingleDrawInteractions( const drawSurf_t *surf, void (*DrawInterac
 	const idMaterial	*lightShader = vLight->lightShader;
 	const float			*lightRegs = vLight->shaderRegisters;
 	drawInteraction_t	inter;
+	inter.diffuseMatrix[0].Zero();
+	inter.diffuseMatrix[1].Zero();
+	inter.specularMatrix[0].Zero();
+	inter.specularMatrix[1].Zero();
 
 	if ( r_skipInteractions.GetBool() || !surf->geo || !surf->geo->ambientCache ) {
 		return;
@@ -848,6 +855,31 @@ void RB_DrawView( const void *data ) {
 
 	cmd = (const drawSurfsCommand_t *)data;
 
+	// with r_lockSurfaces enabled, we set the locked render view
+	// for the primary viewDef for all the "what should be drawn" calculations.
+	// now it must be reverted to the real render view so the scene gets rendered
+	// from the actual current players point of view
+	if(r_lockSurfaces.GetBool() && tr.primaryView == cmd->viewDef) {
+		viewDef_t* parms = cmd->viewDef;
+		const viewDef_t origParms = *parms;
+
+		*parms = tr.lockSurfacesRealViewDef; // actual current player/camera position
+		parms->renderWorld = origParms.renderWorld;
+		parms->floatTime = origParms.floatTime;
+		parms->drawSurfs = origParms.drawSurfs;
+		parms->numDrawSurfs = origParms.numDrawSurfs;
+		parms->maxDrawSurfs = origParms.maxDrawSurfs;
+		parms->viewLights = origParms.viewLights;
+		parms->viewEntitys = origParms.viewEntitys;
+		parms->connectedAreas = origParms.connectedAreas;
+
+		for( viewEntity_t* vModel = parms->viewEntitys ; vModel ; vModel = vModel->next ) {
+			myGlMultMatrix( vModel->modelMatrix,
+				parms->worldSpace.modelViewMatrix,
+				vModel->modelViewMatrix );
+		}
+	}
+
 	backEnd.viewDef = cmd->viewDef;
 
 	// we will need to do a new copyTexSubImage of the screen
diff --git a/neo/renderer/tr_subview.cpp b/neo/renderer/tr_subview.cpp
index 58c789a..d3049cc 100644
--- a/neo/renderer/tr_subview.cpp
+++ b/neo/renderer/tr_subview.cpp
@@ -490,6 +490,39 @@ bool	R_GenerateSurfaceSubview( drawSurf_t *drawSurf ) {
 		return false;
 	}
 
+	// DG: r_lockSurfaces needs special treatment
+	if ( r_lockSurfaces.GetBool() && tr.viewDef == tr.primaryView ) {
+		// we need the scissor for the "real" viewDef (actual camera position etc)
+		// so mirrors don't "float around" when looking around with r_lockSurfaces enabled
+		// So do the same calculation as before, but with real viewDef (but don't replace
+		// calculation above, so the whole mirror or whatever is skipped if not visible in
+		// locked view!)
+		viewDef_t* origViewDef = tr.viewDef;
+		tr.viewDef = &tr.lockSurfacesRealViewDef;
+		R_PreciseCullSurface( drawSurf, ndcBounds );
+
+		idScreenRect origScissor = scissor;
+
+		idScreenRect	*v2 = &tr.viewDef->viewport;
+		scissor.x1 = v2->x1 + (int)( (v2->x2 - v2->x1 + 1 ) * 0.5f * ( ndcBounds[0][0] + 1.0f ));
+		scissor.y1 = v2->y1 + (int)( (v2->y2 - v2->y1 + 1 ) * 0.5f * ( ndcBounds[0][1] + 1.0f ));
+		scissor.x2 = v2->x1 + (int)( (v2->x2 - v2->x1 + 1 ) * 0.5f * ( ndcBounds[1][0] + 1.0f ));
+		scissor.y2 = v2->y1 + (int)( (v2->y2 - v2->y1 + 1 ) * 0.5f * ( ndcBounds[1][1] + 1.0f ));
+
+		// nudge a bit for safety
+		scissor.Expand();
+
+		scissor.Intersect( tr.viewDef->scissor );
+
+		// TBH I'm not 100% happy with how this is handled - you won't get reliable information
+		// on what's rendered in a mirror this way. Intersecting with the orig. scissor looks "best".
+		// For handling this "properly" we'd need the whole "locked viewDef vs real viewDef" thing
+		// for every subview (instead of just once for the primaryView) which would be a lot of
+		// work for a corner case...
+		scissor.Intersect( origScissor );
+		tr.viewDef = origViewDef;
+	} // DG end
+
 	// see what kind of subview we are making
 	if ( shader->GetSort() != SS_SUBVIEW ) {
 		for ( int i = 0 ; i < shader->GetNumStages() ; i++ ) {
diff --git a/neo/sound/snd_cache.cpp b/neo/sound/snd_cache.cpp
index 1e44bfc..5516682 100644
--- a/neo/sound/snd_cache.cpp
+++ b/neo/sound/snd_cache.cpp
@@ -501,7 +501,9 @@ void idSoundSample::Load( void ) {
 				hardwareBuffer = true;
 			}
 		}
+	}
 
+	{
 		// OGG decompressed at load time (when smaller than s_decompressionLimit seconds, 6 seconds by default)
 		if ( objectInfo.wFormatTag == WAVE_FORMAT_TAG_OGG ) {
 			if ( ( objectSize < ( ( int ) objectInfo.nSamplesPerSec * idSoundSystemLocal::s_decompressionLimit.GetInteger() ) ) ) {
diff --git a/neo/sound/snd_decoder.cpp b/neo/sound/snd_decoder.cpp
index f7040ad..52152d3 100644
--- a/neo/sound/snd_decoder.cpp
+++ b/neo/sound/snd_decoder.cpp
@@ -26,9 +26,17 @@ If you have questions concerning this license or the applicable additional terms
 ===========================================================================
 */
 
-#define OV_EXCLUDE_STATIC_CALLBACKS
-#include <vorbis/codec.h>
-#include <vorbis/vorbisfile.h>
+
+#include "SDL_endian.h"
+#if SDL_BYTEORDER == SDL_BIG_ENDIAN
+  #define STB_VORBIS_BIG_ENDIAN
+#endif
+#define STB_VORBIS_NO_STDIO
+#define STB_VORBIS_NO_PUSHDATA_API // we're using the pulldata API
+#include "stb_vorbis.h"
+#undef L // the implementation part of stb_vorbis has these defines, they confuse other code..
+#undef C
+#undef R
 
 #include "sys/platform.h"
 #include "framework/FileSystem.h"
@@ -49,6 +57,9 @@ idDynamicBlockAlloc<byte, 1<<20, 128>		decoderMemoryAllocator;
 
 const int MIN_OGGVORBIS_MEMORY				= 768 * 1024;
 
+// DG: this was only used with original Doom3's patched libvorbis
+// TODO: could use it in stb_vorbis setup_malloc() etc
+#if 0
 extern "C" {
 	void *_decoder_malloc( size_t size );
 	void *_decoder_calloc( size_t num, size_t size );
@@ -78,6 +89,51 @@ void *_decoder_realloc( void *memblock, size_t size ) {
 void _decoder_free( void *memblock ) {
 	decoderMemoryAllocator.Free( (byte *)memblock );
 }
+#endif
+
+static const char* my_stbv_strerror(int stbVorbisError)
+{
+	switch(stbVorbisError)
+	{
+		case VORBIS__no_error: return "No Error";
+#define ERRCASE(X) \
+		case VORBIS_ ## X : return #X;
+
+		ERRCASE( need_more_data )    // not a real error
+
+		ERRCASE( invalid_api_mixing )           // can't mix API modes
+		ERRCASE( outofmem )                     // not enough memory
+		ERRCASE( feature_not_supported )        // uses floor 0
+		ERRCASE( too_many_channels )            // STB_VORBIS_MAX_CHANNELS is too small
+		ERRCASE( file_open_failure )            // fopen() failed
+		ERRCASE( seek_without_length )          // can't seek in unknown-length file
+
+		ERRCASE( unexpected_eof )               // file is truncated?
+		ERRCASE( seek_invalid )                 // seek past EOF
+
+		// decoding errors (corrupt/invalid stream) -- you probably
+		// don't care about the exact details of these
+
+		// vorbis errors:
+		ERRCASE( invalid_setup )
+		ERRCASE( invalid_stream )
+
+		// ogg errors:
+		ERRCASE( missing_capture_pattern )
+		ERRCASE( invalid_stream_structure_version )
+		ERRCASE( continued_packet_flag_invalid )
+		ERRCASE( incorrect_stream_serial_number )
+		ERRCASE( invalid_first_page )
+		ERRCASE( bad_packet_type )
+		ERRCASE( cant_find_last_page )
+		ERRCASE( seek_failed )
+		ERRCASE( ogg_skeleton_not_supported )
+
+#undef ERRCASE
+	}
+	assert(0 && "unknown stb_vorbis errorcode!");
+	return "Unknown Error!";
+}
 
 
 /*
@@ -88,80 +144,12 @@ void _decoder_free( void *memblock ) {
 ===================================================================================
 */
 
-/*
-====================
-FS_ReadOGG
-====================
-*/
-size_t FS_ReadOGG( void *dest, size_t size1, size_t size2, void *fh ) {
-	idFile *f = reinterpret_cast<idFile *>(fh);
-	return f->Read( dest, size1 * size2 );
-}
-
-/*
-====================
-FS_SeekOGG
-====================
-*/
-int FS_SeekOGG( void *fh, ogg_int64_t to, int type ) {
-	fsOrigin_t retype = FS_SEEK_SET;
-
-	if ( type == SEEK_CUR ) {
-		retype = FS_SEEK_CUR;
-	} else if ( type == SEEK_END ) {
-		retype = FS_SEEK_END;
-	} else if ( type == SEEK_SET ) {
-		retype = FS_SEEK_SET;
-	} else {
-		common->FatalError( "fs_seekOGG: seek without type\n" );
-	}
-	idFile *f = reinterpret_cast<idFile *>(fh);
-	return f->Seek( to, retype );
-}
-
-/*
-====================
-FS_CloseOGG
-====================
-*/
-int FS_CloseOGG( void *fh ) {
-	return 0;
-}
-
-/*
-====================
-FS_TellOGG
-====================
-*/
-long FS_TellOGG( void *fh ) {
-	idFile *f = reinterpret_cast<idFile *>(fh);
-	return f->Tell();
-}
-
-/*
-====================
-ov_openFile
-====================
-*/
-int ov_openFile( idFile *f, OggVorbis_File *vf ) {
-	ov_callbacks callbacks;
-
-	memset( vf, 0, sizeof( OggVorbis_File ) );
-
-	callbacks.read_func = FS_ReadOGG;
-	callbacks.seek_func = FS_SeekOGG;
-	callbacks.close_func = FS_CloseOGG;
-	callbacks.tell_func = FS_TellOGG;
-	return ov_open_callbacks((void *)f, vf, NULL, -1, callbacks);
-}
-
 /*
 ====================
 idWaveFile::OpenOGG
 ====================
 */
 int idWaveFile::OpenOGG( const char* strFileName, waveformatex_t *pwfx ) {
-	OggVorbis_File *ov;
 
 	memset( pwfx, 0, sizeof( waveformatex_t ) );
 
@@ -172,11 +160,17 @@ int idWaveFile::OpenOGG( const char* strFileName, waveformatex_t *pwfx ) {
 
 	Sys_EnterCriticalSection( CRITICAL_SECTION_ONE );
 
-	ov = new OggVorbis_File;
+	int fileSize = mhmmio->Length();
+	byte* oggFileData = (byte*)Mem_Alloc( fileSize );
+
+	mhmmio->Read( oggFileData, fileSize );
 
-	if( ov_openFile( mhmmio, ov ) < 0 ) {
-		delete ov;
+	int stbverr = 0;
+	stb_vorbis *ov = stb_vorbis_open_memory( oggFileData, fileSize, &stbverr, NULL );
+	if( ov == NULL ) {
+		Mem_Free( oggFileData );
 		Sys_LeaveCriticalSection( CRITICAL_SECTION_ONE );
+		common->Warning( "Opening OGG file '%s' with stb_vorbis failed: %s\n", strFileName, my_stbv_strerror(stbverr) );
 		fileSystem->CloseFile( mhmmio );
 		mhmmio = NULL;
 		return -1;
@@ -184,20 +178,26 @@ int idWaveFile::OpenOGG( const char* strFileName, waveformatex_t *pwfx ) {
 
 	mfileTime = mhmmio->Timestamp();
 
-	vorbis_info *vi = ov_info( ov, -1 );
+	stb_vorbis_info stbvi = stb_vorbis_get_info( ov );
+	int numSamples = stb_vorbis_stream_length_in_samples( ov );
+	if(numSamples == 0) {
+		stbverr = stb_vorbis_get_error( ov );
+		common->Warning( "Couldn't get sound length of '%s' with stb_vorbis: %s\n", strFileName, my_stbv_strerror(stbverr) );
+		// TODO:  return -1 etc?
+	}
 
-	mpwfx.Format.nSamplesPerSec = vi->rate;
-	mpwfx.Format.nChannels = vi->channels;
+	mpwfx.Format.nSamplesPerSec = stbvi.sample_rate;
+	mpwfx.Format.nChannels = stbvi.channels;
 	mpwfx.Format.wBitsPerSample = sizeof(short) * 8;
-	mdwSize = ov_pcm_total( ov, -1 ) * vi->channels;	// pcm samples * num channels
+	mdwSize = numSamples * stbvi.channels;	// pcm samples * num channels
 	mbIsReadingFromMemory = false;
 
 	if ( idSoundSystemLocal::s_realTimeDecoding.GetBool() ) {
 
-		ov_clear( ov );
+		stb_vorbis_close( ov );
 		fileSystem->CloseFile( mhmmio );
 		mhmmio = NULL;
-		delete ov;
+		Mem_Free( oggFileData );
 
 		mpwfx.Format.wFormatTag = WAVE_FORMAT_TAG_OGG;
 		mhmmio = fileSystem->OpenFileRead( strFileName );
@@ -206,6 +206,7 @@ int idWaveFile::OpenOGG( const char* strFileName, waveformatex_t *pwfx ) {
 	} else {
 
 		ogg = ov;
+		oggData = oggFileData;
 
 		mpwfx.Format.wFormatTag = WAVE_FORMAT_TAG_PCM;
 		mMemSize = mdwSize * sizeof( short );
@@ -226,18 +227,27 @@ idWaveFile::ReadOGG
 ====================
 */
 int idWaveFile::ReadOGG( byte* pBuffer, int dwSizeToRead, int *pdwSizeRead ) {
-	int total = dwSizeToRead;
-	char *bufferPtr = (char *)pBuffer;
-	OggVorbis_File *ov = (OggVorbis_File *) ogg;
+	// DG: Note that stb_vorbis_get_samples_short_interleaved() operates on shorts,
+	//     while VorbisFile's ov_read() operates on bytes, so some numbers are different
+	int total = dwSizeToRead/sizeof(short);
+	short *bufferPtr = (short *)pBuffer;
+	stb_vorbis *ov = (stb_vorbis *) ogg;
 
 	do {
-		int ret = ov_read( ov, bufferPtr, total >= 4096 ? 4096 : total, Swap_IsBigEndian(), 2, 1, NULL );
+		int numShorts = total; // total >= 2048 ? 2048 : total; - I think stb_vorbis doesn't mind decoding all of it
+		int ret = stb_vorbis_get_samples_short_interleaved( ov, mpwfx.Format.nChannels, bufferPtr, numShorts );
 		if ( ret == 0 ) {
 			break;
 		}
 		if ( ret < 0 ) {
+			int stbverr = stb_vorbis_get_error( ov );
+			common->Warning( "idWaveFile::ReadOGG() stb_vorbis_get_samples_short_interleaved() %d shorts failed: %s\n", numShorts, my_stbv_strerror(stbverr) );
 			return -1;
 		}
+		// for some reason, stb_vorbis_get_samples_short_interleaved() takes the absolute
+		// number of shorts to read as a function argument, but returns the number of samples
+		// that were read PER CHANNEL
+		ret *= mpwfx.Format.nChannels;
 		bufferPtr += ret;
 		total -= ret;
 	} while( total > 0 );
@@ -257,15 +267,16 @@ idWaveFile::CloseOGG
 ====================
 */
 int idWaveFile::CloseOGG( void ) {
-	OggVorbis_File *ov = (OggVorbis_File *) ogg;
+	stb_vorbis* ov = (stb_vorbis *)ogg;
 	if ( ov != NULL ) {
 		Sys_EnterCriticalSection( CRITICAL_SECTION_ONE );
-		ov_clear( ov );
-		delete ov;
+		stb_vorbis_close( ov );
 		Sys_LeaveCriticalSection( CRITICAL_SECTION_ONE );
 		fileSystem->CloseFile( mhmmio );
 		mhmmio = NULL;
 		ogg = NULL;
+		Mem_Free( oggData );
+		oggData = NULL;
 		return 0;
 	}
 	return -1;
@@ -297,9 +308,8 @@ private:
 	idSoundSample *			lastSample;			// last sample being decoded
 	int						lastSampleOffset;	// last offset into the decoded sample
 	int						lastDecodeTime;		// last time decoding sound
-	idFile_Memory			file;				// encoded file in memory
 
-	OggVorbis_File			ogg;				// OggVorbis file
+	stb_vorbis*				stbv;				// stb_vorbis (Ogg) handle, using lastSample->nonCacheData
 };
 
 idBlockAlloc<idSampleDecoderLocal, 64>		sampleDecoderAllocator;
@@ -376,6 +386,7 @@ void idSampleDecoderLocal::Clear( void ) {
 	lastSample = NULL;
 	lastSampleOffset = 0;
 	lastDecodeTime = 0;
+	stbv = NULL;
 }
 
 /*
@@ -391,8 +402,8 @@ void idSampleDecoderLocal::ClearDecoder( void ) {
 			break;
 		}
 		case WAVE_FORMAT_TAG_OGG: {
-			ov_clear( &ogg );
-			memset( &ogg, 0, sizeof( ogg ) );
+			stb_vorbis_close( stbv );
+			stbv = NULL;
 			break;
 		}
 	}
@@ -526,8 +537,12 @@ int idSampleDecoderLocal::DecodeOGG( idSoundSample *sample, int sampleOffset44k,
 			failed = true;
 			return 0;
 		}
-		file.SetData( (const char *)sample->nonCacheData, sample->objectMemSize );
-		if ( ov_openFile( &file, &ogg ) < 0 ) {
+		assert(stbv == NULL);
+		int stbVorbErr = 0;
+		stbv = stb_vorbis_open_memory( sample->nonCacheData, sample->objectMemSize, &stbVorbErr, NULL );
+		if ( stbv == NULL ) {
+			common->Warning( "idSampleDecoderLocal::DecodeOGG() stb_vorbis_open_memory() for %s failed: %s\n",
+							 sample->name.c_str(), my_stbv_strerror(stbVorbErr) );
 			failed = true;
 			return 0;
 		}
@@ -535,9 +550,22 @@ int idSampleDecoderLocal::DecodeOGG( idSoundSample *sample, int sampleOffset44k,
 		lastSample = sample;
 	}
 
+	if( sample->objectInfo.nChannels > 2 ) {
+		assert( 0 && ">2 channels currently not supported (samplesBuf expects 1 or 2)" );
+		common->Warning( "Ogg Vorbis files with >2 channels are not supported!\n" );
+		// no idea if other parts of the engine support more than stereo;
+		// pretty sure though the standard gamedata doesn't use it (positional sounds must be mono anyway)
+		failed = true;
+		return 0;
+	}
+
 	// seek to the right offset if necessary
 	if ( sampleOffset != lastSampleOffset ) {
-		if ( ov_pcm_seek( &ogg, sampleOffset / sample->objectInfo.nChannels ) != 0 ) {
+		if ( stb_vorbis_seek( stbv, sampleOffset / sample->objectInfo.nChannels ) == 0 ) {
+			int stbVorbErr = stb_vorbis_get_error( stbv );
+			int offset = sampleOffset / sample->objectInfo.nChannels;
+			common->Warning( "idSampleDecoderLocal::DecodeOGG() stb_vorbis_seek(%d) for %s failed: %s\n",
+			                 offset, sample->name.c_str(), my_stbv_strerror( stbVorbErr ) );
 			failed = true;
 			return 0;
 		}
@@ -549,12 +577,37 @@ int idSampleDecoderLocal::DecodeOGG( idSoundSample *sample, int sampleOffset44k,
 	totalSamples = sampleCount;
 	readSamples = 0;
 	do {
-		float **samples;
-		int ret = ov_read_float( &ogg, &samples, totalSamples / sample->objectInfo.nChannels, NULL );
-		if ( ret == 0 ) {
-			failed = true;
+		// DG: in contrast to libvorbisfile's ov_read_float(), stb_vorbis_get_samples_float() expects you to
+		//     pass a buffer to store the decoded samples in, so limit it to 4096 samples/channel per iteration
+		float samplesBuf[2][MIXBUFFER_SAMPLES];
+		float* samples[2] = { samplesBuf[0], samplesBuf[1] };
+		int reqSamples = Min( MIXBUFFER_SAMPLES, totalSamples / sample->objectInfo.nChannels );
+		int ret = stb_vorbis_get_samples_float( stbv, sample->objectInfo.nChannels, samples, reqSamples );
+		if ( reqSamples == 0 ) {
+			// DG: it happened that sampleCount was an odd number in a *stereo* sound file
+			//  and eventually totalSamples was 1 and thus reqSamples = totalSamples/2 was 0
+			//  so this turned into an endless loop.. it shouldn't happen anymore due to changes
+			//  in idSoundWorldLocal::ReadFromSaveGame(), but better safe than sorry..
+			common->DPrintf( "idSampleDecoderLocal::DecodeOGG() reqSamples == 0\n  for %s ?!\n", sample->name.c_str() );
+			readSamples += totalSamples;
+			totalSamples = 0;
 			break;
 		}
+		if ( ret == 0 ) {
+			int stbVorbErr = stb_vorbis_get_error( stbv );
+			if ( stbVorbErr == VORBIS__no_error && reqSamples < 5 ) {
+				// DG: it sometimes happens that 0 is returned when reqSamples was 1 and there is no error.
+				// don't really know why; I'll just (arbitrarily) accept up to 5 "dropped" samples
+				ret = reqSamples; // pretend decoding went ok
+				common->DPrintf( "idSampleDecoderLocal::DecodeOGG() IGNORING stb_vorbis_get_samples_float() dropping %d (%d) samples\n  for %s\n",
+					reqSamples, totalSamples, sample->name.c_str() );
+			} else {
+				common->Warning( "idSampleDecoderLocal::DecodeOGG() stb_vorbis_get_samples_float() %d (%d) samples\n  for %s failed: %s\n",
+					reqSamples, totalSamples, sample->name.c_str(), my_stbv_strerror( stbVorbErr ) );
+				failed = true;
+				break;
+			}
+		}
 		if ( ret < 0 ) {
 			failed = true;
 			return 0;
diff --git a/neo/sound/snd_emitter.cpp b/neo/sound/snd_emitter.cpp
index 08c6fa3..e0230b3 100644
--- a/neo/sound/snd_emitter.cpp
+++ b/neo/sound/snd_emitter.cpp
@@ -186,6 +186,7 @@ void idSoundChannel::Clear( void ) {
 	memset( &parms, 0, sizeof(parms) );
 
 	triggered = false;
+	paused = false;
 	openalSource = 0;
 	openalStreamingOffset = 0;
 	openalStreamingBuffer[0] = openalStreamingBuffer[1] = openalStreamingBuffer[2] = 0;
@@ -227,6 +228,9 @@ void idSoundChannel::ALStop( void ) {
 	if ( alIsSource( openalSource ) ) {
 		alSourceStop( openalSource );
 		alSourcei( openalSource, AL_BUFFER, 0 );
+		// unassociate effect slot from source, so the effect slot can be deleted on shutdown
+		// even though the source itself is deleted later (in idSoundSystemLocal::Shutdown())
+		alSource3i( openalSource, AL_AUXILIARY_SEND_FILTER, AL_EFFECTSLOT_NULL, 0, AL_FILTER_NULL );
 		soundSystemLocal.FreeOpenALSource( openalSource );
 	}
 
@@ -959,6 +963,49 @@ void idSoundEmitterLocal::StopSound( const s_channelType channel ) {
 	Sys_LeaveCriticalSection();
 }
 
+// DG: to pause active OpenAL sources when entering menu etc
+void idSoundEmitterLocal::PauseAll( void ) {
+
+	Sys_EnterCriticalSection();
+
+	for( int i = 0; i < SOUND_MAX_CHANNELS; i++ ) {
+		idSoundChannel	*chan = &channels[i];
+
+		if ( !chan->triggerState ) {
+			continue;
+		}
+
+		if ( alIsSource( chan->openalSource ) ) {
+			alSourcePause( chan->openalSource );
+			chan->paused = true;
+		}
+	}
+
+	Sys_LeaveCriticalSection();
+}
+
+
+// DG: to resume active OpenAL sources when leaving menu etc
+void idSoundEmitterLocal::UnPauseAll( void ) {
+
+	Sys_EnterCriticalSection();
+
+	for( int i = 0; i < SOUND_MAX_CHANNELS; i++ ) {
+		idSoundChannel	*chan = &channels[i];
+
+		if ( !chan->triggerState ) {
+			continue;
+		}
+
+		if ( alIsSource( chan->openalSource ) && chan->paused ) {
+			alSourcePlay( chan->openalSource );
+			chan->paused = false;
+		}
+	}
+
+	Sys_LeaveCriticalSection();
+}
+
 /*
 ===================
 idSoundEmitterLocal::FadeSound
diff --git a/neo/sound/snd_local.h b/neo/sound/snd_local.h
index ccb1125..6c5790e 100644
--- a/neo/sound/snd_local.h
+++ b/neo/sound/snd_local.h
@@ -202,6 +202,7 @@ private:
 	dword			mulDataSize;
 
 	void *			ogg;			// only !NULL when !s_realTimeDecoding
+	byte*			oggData; // the contents of the .ogg for stbi_vorbis (it doesn't support custom reading callbacks)
 	bool			isOgg;
 
 private:
@@ -386,6 +387,8 @@ public:
 	ALuint				lastopenalStreamingBuffer[3];
 	bool				stopped;
 
+	bool				paused;					// DG: currently paused, but generally still playing - for when menu is open etc
+
 	bool				disallowSlow;
 
 };
@@ -428,6 +431,9 @@ public:
 
 	void				Clear( void );
 
+	void				PauseAll( void );   // DG: to pause active OpenAL sources when entering menu etc
+	void				UnPauseAll( void ); // DG: to resume active OpenAL sources when leaving menu etc
+
 	void				OverrideParms( const soundShaderParms_t *base, const soundShaderParms_t *over, soundShaderParms_t *out );
 	void				CheckForCompletion( int current44kHzTime );
 	void				Spatialize( idVec3 listenerPos, int listenerArea, idRenderWorld *rw );
@@ -595,6 +601,7 @@ public:
 	ALuint					listenerSlot;
 	bool					listenerAreFiltersInitialized;
 	ALuint					listenerFilters[2]; // 0 - direct; 1 - send.
+	float					listenerSlotReverbGain;
 
 	int						gameMsec;
 	int						game44kHz;
@@ -745,6 +752,7 @@ public:
 	LPALDELETEAUXILIARYEFFECTSLOTS	alDeleteAuxiliaryEffectSlots;
 	LPALISAUXILIARYEFFECTSLOT		alIsAuxiliaryEffectSlot;
 	LPALAUXILIARYEFFECTSLOTI		alAuxiliaryEffectSloti;
+	LPALAUXILIARYEFFECTSLOTF		alAuxiliaryEffectSlotf;
 
 	idEFXFile				EFXDatabase;
 	bool					efxloaded;
@@ -787,6 +795,8 @@ public:
 	static idCVar			s_useEAXReverb;
 	static idCVar			s_decompressionLimit;
 
+	static idCVar			s_alReverbGain;
+
 	static idCVar			s_slowAttenuate;
 
 	static idCVar			s_enviroSuitCutoffFreq;
diff --git a/neo/sound/snd_system.cpp b/neo/sound/snd_system.cpp
index d9a2848..3b070a8 100644
--- a/neo/sound/snd_system.cpp
+++ b/neo/sound/snd_system.cpp
@@ -29,6 +29,7 @@ If you have questions concerning this license or the applicable additional terms
 #include "sys/platform.h"
 
 #include "sound/snd_local.h"
+#include <limits.h>
 
 #ifdef ID_DEDICATED
 idCVar idSoundSystemLocal::s_noSound( "s_noSound", "1", CVAR_SOUND | CVAR_BOOL | CVAR_ROM, "" );
@@ -77,6 +78,8 @@ idCVar idSoundSystemLocal::s_useEAXReverb( "s_useEAXReverb", "0", CVAR_SOUND | C
 idCVar idSoundSystemLocal::s_decompressionLimit( "s_decompressionLimit", "6", CVAR_SOUND | CVAR_INTEGER | CVAR_ROM, "specifies maximum uncompressed sample length in seconds" );
 #endif
 
+idCVar idSoundSystemLocal::s_alReverbGain( "s_alReverbGain", "0.5", CVAR_SOUND | CVAR_FLOAT | CVAR_ARCHIVE, "reduce reverb strength (0.0 to 1.0)", 0.0f, 1.0f );
+
 bool idSoundSystemLocal::useEFXReverb = false;
 int idSoundSystemLocal::EFXAvailable = -1;
 
@@ -429,6 +432,7 @@ void idSoundSystemLocal::Init() {
 			alDeleteAuxiliaryEffectSlots = (LPALDELETEAUXILIARYEFFECTSLOTS)alGetProcAddress("alDeleteAuxiliaryEffectSlots");
 			alIsAuxiliaryEffectSlot = (LPALISAUXILIARYEFFECTSLOT)alGetProcAddress("alIsAuxiliaryEffectSlot");;
 			alAuxiliaryEffectSloti = (LPALAUXILIARYEFFECTSLOTI)alGetProcAddress("alAuxiliaryEffectSloti");
+			alAuxiliaryEffectSlotf = (LPALAUXILIARYEFFECTSLOTF)alGetProcAddress("alAuxiliaryEffectSlotf");
 		} else {
 			common->Printf( "OpenAL: EFX extension not found\n" );
 			EFXAvailable = 0;
@@ -449,6 +453,7 @@ void idSoundSystemLocal::Init() {
 			alDeleteAuxiliaryEffectSlots = NULL;
 			alIsAuxiliaryEffectSlot = NULL;
 			alAuxiliaryEffectSloti = NULL;
+			alAuxiliaryEffectSlotf = NULL;
 		}
 
 		ALuint handle;
@@ -789,7 +794,19 @@ int idSoundSystemLocal::AsyncUpdateWrite( int inTime ) {
 		return 0;
 	}
 
-	int sampleTime = inTime * 44.1f;
+	// inTime is in milliseconds and if running for long enough that overflows,
+	// when multiplying with 44.1 it overflows even sooner, so use int64 at first
+	// (and double because float doesn't have good precision at bigger numbers)
+	// and then manually truncate to regular int afterwards - this should at least
+	// prevent sampleTime becoming negative (as long as inTime is not)
+	long long int sampleTime64 = double( inTime ) * 44.1;
+
+	// furthermore, sampleTime should be divisible by 8
+	// (at least by 4 for handling 11kHz samples) so round to nearest multiple of 8
+	sampleTime64 = (sampleTime64 + 4) & ~(long long int)7;
+
+	const int sampleTime = sampleTime64 & INT_MAX;
+
 	int numSpeakers = s_numberOfSpeakers.GetInteger();
 
 	// enable audio hardware caching
diff --git a/neo/sound/snd_wavefile.cpp b/neo/sound/snd_wavefile.cpp
index 556b1e5..5d16fcd 100644
--- a/neo/sound/snd_wavefile.cpp
+++ b/neo/sound/snd_wavefile.cpp
@@ -46,6 +46,7 @@ idWaveFile::idWaveFile( void ) {
 	mpbData		= NULL;
 	ogg			= NULL;
 	isOgg		= false;
+	oggData		= NULL;
 }
 
 //-----------------------------------------------------------------------------
diff --git a/neo/sound/snd_world.cpp b/neo/sound/snd_world.cpp
index 5ef7094..46b5056 100644
--- a/neo/sound/snd_world.cpp
+++ b/neo/sound/snd_world.cpp
@@ -93,6 +93,9 @@ void idSoundWorldLocal::Init( idRenderWorld *renderWorld ) {
 				// pow(10.0, (-1150*1.5)/2000.0)
 				soundSystemLocal.alFilterf(listenerFilters[1], AL_LOWPASS_GAINHF, 0.137246f);
 			}
+			// allow reducing the gain effect globally via s_alReverbGain CVar
+			listenerSlotReverbGain = soundSystemLocal.s_alReverbGain.GetFloat();
+			soundSystemLocal.alAuxiliaryEffectSlotf(listenerSlot, AL_EFFECTSLOT_GAIN, listenerSlotReverbGain);
 		}
 	}
 
@@ -127,7 +130,10 @@ idSoundWorldLocal::idSoundWorldLocal
 ===============
 */
 idSoundWorldLocal::idSoundWorldLocal() {
+	listenerEffect                = 0;
+	listenerSlot                  = 0;
 	listenerAreFiltersInitialized = false;
+	listenerSlotReverbGain = 1.0f;
 }
 
 /*
@@ -155,6 +161,15 @@ void idSoundWorldLocal::Shutdown() {
 
 	AVIClose();
 
+	// delete emitters before deletign the listenerSlot, so their sources aren't
+	// associated with the listenerSlot anymore
+	for ( i = 0; i < emitters.Num(); i++ ) {
+		if ( emitters[i] ) {
+			delete emitters[i];
+			emitters[i] = NULL;
+		}
+	}
+
 	if (idSoundSystemLocal::useEFXReverb) {
 		if (soundSystemLocal.alIsAuxiliaryEffectSlot(listenerSlot)) {
 			soundSystemLocal.alAuxiliaryEffectSloti(listenerSlot, AL_EFFECTSLOT_EFFECT, AL_EFFECTSLOT_NULL);
@@ -171,14 +186,9 @@ void idSoundWorldLocal::Shutdown() {
 				listenerFilters[1] = AL_FILTER_NULL;
 			}
 		}
+		listenerSlotReverbGain = 1.0f;
 	}
 
-	for ( i = 0; i < emitters.Num(); i++ ) {
-		if ( emitters[i] ) {
-			delete emitters[i];
-			emitters[i] = NULL;
-		}
-	}
 	localSound = NULL;
 }
 
@@ -519,6 +529,13 @@ void idSoundWorldLocal::MixLoop( int current44kHz, int numSpeakers, float *final
 		ALuint effect = 0;
 		idStr s(listenerArea);
 
+		// allow reducing the gain effect globally via s_alReverbGain CVar
+		float gain = soundSystemLocal.s_alReverbGain.GetFloat();
+		if (listenerSlotReverbGain != gain) {
+			listenerSlotReverbGain = gain;
+			soundSystemLocal.alAuxiliaryEffectSlotf(listenerSlot, AL_EFFECTSLOT_GAIN, gain);
+		}
+
 		bool found = soundSystemLocal.EFXDatabase.FindEffect(s, &effect);
 		if (!found) {
 			s = listenerAreaName;
@@ -1366,6 +1383,11 @@ void idSoundWorldLocal::ReadFromSaveGame( idFile *savefile ) {
 			// make sure we start up the hardware voice if needed
 			chan->triggered = chan->triggerState;
 			chan->openalStreamingOffset = currentSoundTime - chan->trigger44kHzTime;
+			// DG: round up openalStreamingOffset to multiple of 8, so it still has an even number
+			//  if we calculate "how many 11kHz stereo samples do we need to decode" and don't
+			//  run into a "I need one more sample apparently, so decode 0 stereo samples"
+			//  situation that could cause an endless loop.. (44kHz/11kHz = 4; *2 for stereo => 8)
+			chan->openalStreamingOffset = (chan->openalStreamingOffset+7) & ~7;
 
 			// adjust the hardware fade time
 			if ( chan->channelFade.fadeStart44kHz != 0 ) {
@@ -1477,6 +1499,16 @@ void idSoundWorldLocal::Pause( void ) {
 	}
 
 	pause44kHz = soundSystemLocal.GetCurrent44kHzTime();
+
+	for ( int i = 0; i < emitters.Num(); i++ ) {
+		idSoundEmitterLocal * emitter = emitters[i];
+
+		// if no channels are active, do nothing
+		if ( emitter == NULL || !emitter->playing ) {
+			continue;
+		}
+		emitter->PauseAll();
+	}
 }
 
 /*
@@ -1496,6 +1528,16 @@ void idSoundWorldLocal::UnPause( void ) {
 	OffsetSoundTime( offset44kHz );
 
 	pause44kHz = -1;
+
+	for ( int i = 0; i < emitters.Num(); i++ ) {
+		idSoundEmitterLocal * emitter = emitters[i];
+
+		// if no channels are active, do nothing
+		if ( emitter == NULL || !emitter->playing ) {
+			continue;
+		}
+		emitter->UnPauseAll();
+	}
 }
 
 /*
diff --git a/neo/sound/stb_vorbis.h b/neo/sound/stb_vorbis.h
new file mode 100644
index 0000000..3e5c250
--- /dev/null
+++ b/neo/sound/stb_vorbis.h
@@ -0,0 +1,5584 @@
+// Ogg Vorbis audio decoder - v1.22 - public domain
+// http://nothings.org/stb_vorbis/
+//
+// Original version written by Sean Barrett in 2007.
+//
+// Originally sponsored by RAD Game Tools. Seeking implementation
+// sponsored by Phillip Bennefall, Marc Andersen, Aaron Baker,
+// Elias Software, Aras Pranckevicius, and Sean Barrett.
+//
+// LICENSE
+//
+//   See end of file for license information.
+//
+// Limitations:
+//
+//   - floor 0 not supported (used in old ogg vorbis files pre-2004)
+//   - lossless sample-truncation at beginning ignored
+//   - cannot concatenate multiple vorbis streams
+//   - sample positions are 32-bit, limiting seekable 192Khz
+//       files to around 6 hours (Ogg supports 64-bit)
+//
+// Feature contributors:
+//    Dougall Johnson (sample-exact seeking)
+//
+// Bugfix/warning contributors:
+//    Terje Mathisen     Niklas Frykholm     Andy Hill
+//    Casey Muratori     John Bolton         Gargaj
+//    Laurent Gomila     Marc LeBlanc        Ronny Chevalier
+//    Bernhard Wodo      Evan Balster        github:alxprd
+//    Tom Beaumont       Ingo Leitgeb        Nicolas Guillemot
+//    Phillip Bennefall  Rohit               Thiago Goulart
+//    github:manxorist   Saga Musix          github:infatum
+//    Timur Gagiev       Maxwell Koo         Peter Waller
+//    github:audinowho   Dougall Johnson     David Reid
+//    github:Clownacy    Pedro J. Estebanez  Remi Verschelde
+//    AnthoFoxo          github:morlat       Gabriel Ravier
+//
+// Partial history:
+//    1.22    - 2021-07-11 - various small fixes
+//    1.21    - 2021-07-02 - fix bug for files with no comments
+//    1.20    - 2020-07-11 - several small fixes
+//    1.19    - 2020-02-05 - warnings
+//    1.18    - 2020-02-02 - fix seek bugs; parse header comments; misc warnings etc.
+//    1.17    - 2019-07-08 - fix CVE-2019-13217..CVE-2019-13223 (by ForAllSecure)
+//    1.16    - 2019-03-04 - fix warnings
+//    1.15    - 2019-02-07 - explicit failure if Ogg Skeleton data is found
+//    1.14    - 2018-02-11 - delete bogus dealloca usage
+//    1.13    - 2018-01-29 - fix truncation of last frame (hopefully)
+//    1.12    - 2017-11-21 - limit residue begin/end to blocksize/2 to avoid large temp allocs in bad/corrupt files
+//    1.11    - 2017-07-23 - fix MinGW compilation
+//    1.10    - 2017-03-03 - more robust seeking; fix negative ilog(); clear error in open_memory
+//    1.09    - 2016-04-04 - back out 'truncation of last frame' fix from previous version
+//    1.08    - 2016-04-02 - warnings; setup memory leaks; truncation of last frame
+//    1.07    - 2015-01-16 - fixes for crashes on invalid files; warning fixes; const
+//    1.06    - 2015-08-31 - full, correct support for seeking API (Dougall Johnson)
+//                           some crash fixes when out of memory or with corrupt files
+//                           fix some inappropriately signed shifts
+//    1.05    - 2015-04-19 - don't define __forceinline if it's redundant
+//    1.04    - 2014-08-27 - fix missing const-correct case in API
+//    1.03    - 2014-08-07 - warning fixes
+//    1.02    - 2014-07-09 - declare qsort comparison as explicitly _cdecl in Windows
+//    1.01    - 2014-06-18 - fix stb_vorbis_get_samples_float (interleaved was correct)
+//    1.0     - 2014-05-26 - fix memory leaks; fix warnings; fix bugs in >2-channel;
+//                           (API change) report sample rate for decode-full-file funcs
+//
+// See end of file for full version history.
+
+
+//////////////////////////////////////////////////////////////////////////////
+//
+//  HEADER BEGINS HERE
+//
+
+#ifndef STB_VORBIS_INCLUDE_STB_VORBIS_H
+#define STB_VORBIS_INCLUDE_STB_VORBIS_H
+
+#if defined(STB_VORBIS_NO_CRT) && !defined(STB_VORBIS_NO_STDIO)
+#define STB_VORBIS_NO_STDIO 1
+#endif
+
+#ifndef STB_VORBIS_NO_STDIO
+#include <stdio.h>
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+///////////   THREAD SAFETY
+
+// Individual stb_vorbis* handles are not thread-safe; you cannot decode from
+// them from multiple threads at the same time. However, you can have multiple
+// stb_vorbis* handles and decode from them independently in multiple thrads.
+
+
+///////////   MEMORY ALLOCATION
+
+// normally stb_vorbis uses malloc() to allocate memory at startup,
+// and alloca() to allocate temporary memory during a frame on the
+// stack. (Memory consumption will depend on the amount of setup
+// data in the file and how you set the compile flags for speed
+// vs. size. In my test files the maximal-size usage is ~150KB.)
+//
+// You can modify the wrapper functions in the source (setup_malloc,
+// setup_temp_malloc, temp_malloc) to change this behavior, or you
+// can use a simpler allocation model: you pass in a buffer from
+// which stb_vorbis will allocate _all_ its memory (including the
+// temp memory). "open" may fail with a VORBIS_outofmem if you
+// do not pass in enough data; there is no way to determine how
+// much you do need except to succeed (at which point you can
+// query get_info to find the exact amount required. yes I know
+// this is lame).
+//
+// If you pass in a non-NULL buffer of the type below, allocation
+// will occur from it as described above. Otherwise just pass NULL
+// to use malloc()/alloca()
+
+typedef struct
+{
+   char *alloc_buffer;
+   int   alloc_buffer_length_in_bytes;
+} stb_vorbis_alloc;
+
+
+///////////   FUNCTIONS USEABLE WITH ALL INPUT MODES
+
+typedef struct stb_vorbis stb_vorbis;
+
+typedef struct
+{
+   unsigned int sample_rate;
+   int channels;
+
+   unsigned int setup_memory_required;
+   unsigned int setup_temp_memory_required;
+   unsigned int temp_memory_required;
+
+   int max_frame_size;
+} stb_vorbis_info;
+
+typedef struct
+{
+   char *vendor;
+
+   int comment_list_length;
+   char **comment_list;
+} stb_vorbis_comment;
+
+// get general information about the file
+extern stb_vorbis_info stb_vorbis_get_info(stb_vorbis *f);
+
+// get ogg comments
+extern stb_vorbis_comment stb_vorbis_get_comment(stb_vorbis *f);
+
+// get the last error detected (clears it, too)
+extern int stb_vorbis_get_error(stb_vorbis *f);
+
+// close an ogg vorbis file and free all memory in use
+extern void stb_vorbis_close(stb_vorbis *f);
+
+// this function returns the offset (in samples) from the beginning of the
+// file that will be returned by the next decode, if it is known, or -1
+// otherwise. after a flush_pushdata() call, this may take a while before
+// it becomes valid again.
+// NOT WORKING YET after a seek with PULLDATA API
+extern int stb_vorbis_get_sample_offset(stb_vorbis *f);
+
+// returns the current seek point within the file, or offset from the beginning
+// of the memory buffer. In pushdata mode it returns 0.
+extern unsigned int stb_vorbis_get_file_offset(stb_vorbis *f);
+
+///////////   PUSHDATA API
+
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+
+// this API allows you to get blocks of data from any source and hand
+// them to stb_vorbis. you have to buffer them; stb_vorbis will tell
+// you how much it used, and you have to give it the rest next time;
+// and stb_vorbis may not have enough data to work with and you will
+// need to give it the same data again PLUS more. Note that the Vorbis
+// specification does not bound the size of an individual frame.
+
+extern stb_vorbis *stb_vorbis_open_pushdata(
+         const unsigned char * datablock, int datablock_length_in_bytes,
+         int *datablock_memory_consumed_in_bytes,
+         int *error,
+         const stb_vorbis_alloc *alloc_buffer);
+// create a vorbis decoder by passing in the initial data block containing
+//    the ogg&vorbis headers (you don't need to do parse them, just provide
+//    the first N bytes of the file--you're told if it's not enough, see below)
+// on success, returns an stb_vorbis *, does not set error, returns the amount of
+//    data parsed/consumed on this call in *datablock_memory_consumed_in_bytes;
+// on failure, returns NULL on error and sets *error, does not change *datablock_memory_consumed
+// if returns NULL and *error is VORBIS_need_more_data, then the input block was
+//       incomplete and you need to pass in a larger block from the start of the file
+
+extern int stb_vorbis_decode_frame_pushdata(
+         stb_vorbis *f,
+         const unsigned char *datablock, int datablock_length_in_bytes,
+         int *channels,             // place to write number of float * buffers
+         float ***output,           // place to write float ** array of float * buffers
+         int *samples               // place to write number of output samples
+     );
+// decode a frame of audio sample data if possible from the passed-in data block
+//
+// return value: number of bytes we used from datablock
+//
+// possible cases:
+//     0 bytes used, 0 samples output (need more data)
+//     N bytes used, 0 samples output (resynching the stream, keep going)
+//     N bytes used, M samples output (one frame of data)
+// note that after opening a file, you will ALWAYS get one N-bytes,0-sample
+// frame, because Vorbis always "discards" the first frame.
+//
+// Note that on resynch, stb_vorbis will rarely consume all of the buffer,
+// instead only datablock_length_in_bytes-3 or less. This is because it wants
+// to avoid missing parts of a page header if they cross a datablock boundary,
+// without writing state-machiney code to record a partial detection.
+//
+// The number of channels returned are stored in *channels (which can be
+// NULL--it is always the same as the number of channels reported by
+// get_info). *output will contain an array of float* buffers, one per
+// channel. In other words, (*output)[0][0] contains the first sample from
+// the first channel, and (*output)[1][0] contains the first sample from
+// the second channel.
+//
+// *output points into stb_vorbis's internal output buffer storage; these
+// buffers are owned by stb_vorbis and application code should not free
+// them or modify their contents. They are transient and will be overwritten
+// once you ask for more data to get decoded, so be sure to grab any data
+// you need before then.
+
+extern void stb_vorbis_flush_pushdata(stb_vorbis *f);
+// inform stb_vorbis that your next datablock will not be contiguous with
+// previous ones (e.g. you've seeked in the data); future attempts to decode
+// frames will cause stb_vorbis to resynchronize (as noted above), and
+// once it sees a valid Ogg page (typically 4-8KB, as large as 64KB), it
+// will begin decoding the _next_ frame.
+//
+// if you want to seek using pushdata, you need to seek in your file, then
+// call stb_vorbis_flush_pushdata(), then start calling decoding, then once
+// decoding is returning you data, call stb_vorbis_get_sample_offset, and
+// if you don't like the result, seek your file again and repeat.
+#endif
+
+
+//////////   PULLING INPUT API
+
+#ifndef STB_VORBIS_NO_PULLDATA_API
+// This API assumes stb_vorbis is allowed to pull data from a source--
+// either a block of memory containing the _entire_ vorbis stream, or a
+// FILE * that you or it create, or possibly some other reading mechanism
+// if you go modify the source to replace the FILE * case with some kind
+// of callback to your code. (But if you don't support seeking, you may
+// just want to go ahead and use pushdata.)
+
+#if !defined(STB_VORBIS_NO_STDIO) && !defined(STB_VORBIS_NO_INTEGER_CONVERSION)
+extern int stb_vorbis_decode_filename(const char *filename, int *channels, int *sample_rate, short **output);
+#endif
+#if !defined(STB_VORBIS_NO_INTEGER_CONVERSION)
+extern int stb_vorbis_decode_memory(const unsigned char *mem, int len, int *channels, int *sample_rate, short **output);
+#endif
+// decode an entire file and output the data interleaved into a malloc()ed
+// buffer stored in *output. The return value is the number of samples
+// decoded, or -1 if the file could not be opened or was not an ogg vorbis file.
+// When you're done with it, just free() the pointer returned in *output.
+
+extern stb_vorbis * stb_vorbis_open_memory(const unsigned char *data, int len,
+                                  int *error, const stb_vorbis_alloc *alloc_buffer);
+// create an ogg vorbis decoder from an ogg vorbis stream in memory (note
+// this must be the entire stream!). on failure, returns NULL and sets *error
+
+#ifndef STB_VORBIS_NO_STDIO
+extern stb_vorbis * stb_vorbis_open_filename(const char *filename,
+                                  int *error, const stb_vorbis_alloc *alloc_buffer);
+// create an ogg vorbis decoder from a filename via fopen(). on failure,
+// returns NULL and sets *error (possibly to VORBIS_file_open_failure).
+
+extern stb_vorbis * stb_vorbis_open_file(FILE *f, int close_handle_on_close,
+                                  int *error, const stb_vorbis_alloc *alloc_buffer);
+// create an ogg vorbis decoder from an open FILE *, looking for a stream at
+// the _current_ seek point (ftell). on failure, returns NULL and sets *error.
+// note that stb_vorbis must "own" this stream; if you seek it in between
+// calls to stb_vorbis, it will become confused. Moreover, if you attempt to
+// perform stb_vorbis_seek_*() operations on this file, it will assume it
+// owns the _entire_ rest of the file after the start point. Use the next
+// function, stb_vorbis_open_file_section(), to limit it.
+
+extern stb_vorbis * stb_vorbis_open_file_section(FILE *f, int close_handle_on_close,
+                int *error, const stb_vorbis_alloc *alloc_buffer, unsigned int len);
+// create an ogg vorbis decoder from an open FILE *, looking for a stream at
+// the _current_ seek point (ftell); the stream will be of length 'len' bytes.
+// on failure, returns NULL and sets *error. note that stb_vorbis must "own"
+// this stream; if you seek it in between calls to stb_vorbis, it will become
+// confused.
+#endif
+
+extern int stb_vorbis_seek_frame(stb_vorbis *f, unsigned int sample_number);
+extern int stb_vorbis_seek(stb_vorbis *f, unsigned int sample_number);
+// these functions seek in the Vorbis file to (approximately) 'sample_number'.
+// after calling seek_frame(), the next call to get_frame_*() will include
+// the specified sample. after calling stb_vorbis_seek(), the next call to
+// stb_vorbis_get_samples_* will start with the specified sample. If you
+// do not need to seek to EXACTLY the target sample when using get_samples_*,
+// you can also use seek_frame().
+
+extern int stb_vorbis_seek_start(stb_vorbis *f);
+// this function is equivalent to stb_vorbis_seek(f,0)
+
+extern unsigned int stb_vorbis_stream_length_in_samples(stb_vorbis *f);
+extern float        stb_vorbis_stream_length_in_seconds(stb_vorbis *f);
+// these functions return the total length of the vorbis stream
+
+extern int stb_vorbis_get_frame_float(stb_vorbis *f, int *channels, float ***output);
+// decode the next frame and return the number of samples. the number of
+// channels returned are stored in *channels (which can be NULL--it is always
+// the same as the number of channels reported by get_info). *output will
+// contain an array of float* buffers, one per channel. These outputs will
+// be overwritten on the next call to stb_vorbis_get_frame_*.
+//
+// You generally should not intermix calls to stb_vorbis_get_frame_*()
+// and stb_vorbis_get_samples_*(), since the latter calls the former.
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+extern int stb_vorbis_get_frame_short_interleaved(stb_vorbis *f, int num_c, short *buffer, int num_shorts);
+extern int stb_vorbis_get_frame_short            (stb_vorbis *f, int num_c, short **buffer, int num_samples);
+#endif
+// decode the next frame and return the number of *samples* per channel.
+// Note that for interleaved data, you pass in the number of shorts (the
+// size of your array), but the return value is the number of samples per
+// channel, not the total number of samples.
+//
+// The data is coerced to the number of channels you request according to the
+// channel coercion rules (see below). You must pass in the size of your
+// buffer(s) so that stb_vorbis will not overwrite the end of the buffer.
+// The maximum buffer size needed can be gotten from get_info(); however,
+// the Vorbis I specification implies an absolute maximum of 4096 samples
+// per channel.
+
+// Channel coercion rules:
+//    Let M be the number of channels requested, and N the number of channels present,
+//    and Cn be the nth channel; let stereo L be the sum of all L and center channels,
+//    and stereo R be the sum of all R and center channels (channel assignment from the
+//    vorbis spec).
+//        M    N       output
+//        1    k      sum(Ck) for all k
+//        2    *      stereo L, stereo R
+//        k    l      k > l, the first l channels, then 0s
+//        k    l      k <= l, the first k channels
+//    Note that this is not _good_ surround etc. mixing at all! It's just so
+//    you get something useful.
+
+extern int stb_vorbis_get_samples_float_interleaved(stb_vorbis *f, int channels, float *buffer, int num_floats);
+extern int stb_vorbis_get_samples_float(stb_vorbis *f, int channels, float **buffer, int num_samples);
+// gets num_samples samples, not necessarily on a frame boundary--this requires
+// buffering so you have to supply the buffers. DOES NOT APPLY THE COERCION RULES.
+// Returns the number of samples stored per channel; it may be less than requested
+// at the end of the file. If there are no more samples in the file, returns 0.
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+extern int stb_vorbis_get_samples_short_interleaved(stb_vorbis *f, int channels, short *buffer, int num_shorts);
+extern int stb_vorbis_get_samples_short(stb_vorbis *f, int channels, short **buffer, int num_samples);
+#endif
+// gets num_samples samples, not necessarily on a frame boundary--this requires
+// buffering so you have to supply the buffers. Applies the coercion rules above
+// to produce 'channels' channels. Returns the number of samples stored per channel;
+// it may be less than requested at the end of the file. If there are no more
+// samples in the file, returns 0.
+
+#endif
+
+////////   ERROR CODES
+
+enum STBVorbisError
+{
+   VORBIS__no_error,
+
+   VORBIS_need_more_data=1,             // not a real error
+
+   VORBIS_invalid_api_mixing,           // can't mix API modes
+   VORBIS_outofmem,                     // not enough memory
+   VORBIS_feature_not_supported,        // uses floor 0
+   VORBIS_too_many_channels,            // STB_VORBIS_MAX_CHANNELS is too small
+   VORBIS_file_open_failure,            // fopen() failed
+   VORBIS_seek_without_length,          // can't seek in unknown-length file
+
+   VORBIS_unexpected_eof=10,            // file is truncated?
+   VORBIS_seek_invalid,                 // seek past EOF
+
+   // decoding errors (corrupt/invalid stream) -- you probably
+   // don't care about the exact details of these
+
+   // vorbis errors:
+   VORBIS_invalid_setup=20,
+   VORBIS_invalid_stream,
+
+   // ogg errors:
+   VORBIS_missing_capture_pattern=30,
+   VORBIS_invalid_stream_structure_version,
+   VORBIS_continued_packet_flag_invalid,
+   VORBIS_incorrect_stream_serial_number,
+   VORBIS_invalid_first_page,
+   VORBIS_bad_packet_type,
+   VORBIS_cant_find_last_page,
+   VORBIS_seek_failed,
+   VORBIS_ogg_skeleton_not_supported
+};
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif // STB_VORBIS_INCLUDE_STB_VORBIS_H
+//
+//  HEADER ENDS HERE
+//
+//////////////////////////////////////////////////////////////////////////////
+
+#ifndef STB_VORBIS_HEADER_ONLY
+
+// global configuration settings (e.g. set these in the project/makefile),
+// or just set them in this file at the top (although ideally the first few
+// should be visible when the header file is compiled too, although it's not
+// crucial)
+
+// STB_VORBIS_NO_PUSHDATA_API
+//     does not compile the code for the various stb_vorbis_*_pushdata()
+//     functions
+// #define STB_VORBIS_NO_PUSHDATA_API
+
+// STB_VORBIS_NO_PULLDATA_API
+//     does not compile the code for the non-pushdata APIs
+// #define STB_VORBIS_NO_PULLDATA_API
+
+// STB_VORBIS_NO_STDIO
+//     does not compile the code for the APIs that use FILE *s internally
+//     or externally (implied by STB_VORBIS_NO_PULLDATA_API)
+// #define STB_VORBIS_NO_STDIO
+
+// STB_VORBIS_NO_INTEGER_CONVERSION
+//     does not compile the code for converting audio sample data from
+//     float to integer (implied by STB_VORBIS_NO_PULLDATA_API)
+// #define STB_VORBIS_NO_INTEGER_CONVERSION
+
+// STB_VORBIS_NO_FAST_SCALED_FLOAT
+//      does not use a fast float-to-int trick to accelerate float-to-int on
+//      most platforms which requires endianness be defined correctly.
+//#define STB_VORBIS_NO_FAST_SCALED_FLOAT
+
+
+// STB_VORBIS_MAX_CHANNELS [number]
+//     globally define this to the maximum number of channels you need.
+//     The spec does not put a restriction on channels except that
+//     the count is stored in a byte, so 255 is the hard limit.
+//     Reducing this saves about 16 bytes per value, so using 16 saves
+//     (255-16)*16 or around 4KB. Plus anything other memory usage
+//     I forgot to account for. Can probably go as low as 8 (7.1 audio),
+//     6 (5.1 audio), or 2 (stereo only).
+#ifndef STB_VORBIS_MAX_CHANNELS
+#define STB_VORBIS_MAX_CHANNELS    16  // enough for anyone?
+#endif
+
+// STB_VORBIS_PUSHDATA_CRC_COUNT [number]
+//     after a flush_pushdata(), stb_vorbis begins scanning for the
+//     next valid page, without backtracking. when it finds something
+//     that looks like a page, it streams through it and verifies its
+//     CRC32. Should that validation fail, it keeps scanning. But it's
+//     possible that _while_ streaming through to check the CRC32 of
+//     one candidate page, it sees another candidate page. This #define
+//     determines how many "overlapping" candidate pages it can search
+//     at once. Note that "real" pages are typically ~4KB to ~8KB, whereas
+//     garbage pages could be as big as 64KB, but probably average ~16KB.
+//     So don't hose ourselves by scanning an apparent 64KB page and
+//     missing a ton of real ones in the interim; so minimum of 2
+#ifndef STB_VORBIS_PUSHDATA_CRC_COUNT
+#define STB_VORBIS_PUSHDATA_CRC_COUNT  4
+#endif
+
+// STB_VORBIS_FAST_HUFFMAN_LENGTH [number]
+//     sets the log size of the huffman-acceleration table.  Maximum
+//     supported value is 24. with larger numbers, more decodings are O(1),
+//     but the table size is larger so worse cache missing, so you'll have
+//     to probe (and try multiple ogg vorbis files) to find the sweet spot.
+#ifndef STB_VORBIS_FAST_HUFFMAN_LENGTH
+#define STB_VORBIS_FAST_HUFFMAN_LENGTH   10
+#endif
+
+// STB_VORBIS_FAST_BINARY_LENGTH [number]
+//     sets the log size of the binary-search acceleration table. this
+//     is used in similar fashion to the fast-huffman size to set initial
+//     parameters for the binary search
+
+// STB_VORBIS_FAST_HUFFMAN_INT
+//     The fast huffman tables are much more efficient if they can be
+//     stored as 16-bit results instead of 32-bit results. This restricts
+//     the codebooks to having only 65535 possible outcomes, though.
+//     (At least, accelerated by the huffman table.)
+#ifndef STB_VORBIS_FAST_HUFFMAN_INT
+#define STB_VORBIS_FAST_HUFFMAN_SHORT
+#endif
+
+// STB_VORBIS_NO_HUFFMAN_BINARY_SEARCH
+//     If the 'fast huffman' search doesn't succeed, then stb_vorbis falls
+//     back on binary searching for the correct one. This requires storing
+//     extra tables with the huffman codes in sorted order. Defining this
+//     symbol trades off space for speed by forcing a linear search in the
+//     non-fast case, except for "sparse" codebooks.
+// #define STB_VORBIS_NO_HUFFMAN_BINARY_SEARCH
+
+// STB_VORBIS_DIVIDES_IN_RESIDUE
+//     stb_vorbis precomputes the result of the scalar residue decoding
+//     that would otherwise require a divide per chunk. you can trade off
+//     space for time by defining this symbol.
+// #define STB_VORBIS_DIVIDES_IN_RESIDUE
+
+// STB_VORBIS_DIVIDES_IN_CODEBOOK
+//     vorbis VQ codebooks can be encoded two ways: with every case explicitly
+//     stored, or with all elements being chosen from a small range of values,
+//     and all values possible in all elements. By default, stb_vorbis expands
+//     this latter kind out to look like the former kind for ease of decoding,
+//     because otherwise an integer divide-per-vector-element is required to
+//     unpack the index. If you define STB_VORBIS_DIVIDES_IN_CODEBOOK, you can
+//     trade off storage for speed.
+//#define STB_VORBIS_DIVIDES_IN_CODEBOOK
+
+#ifdef STB_VORBIS_CODEBOOK_SHORTS
+#error "STB_VORBIS_CODEBOOK_SHORTS is no longer supported as it produced incorrect results for some input formats"
+#endif
+
+// STB_VORBIS_DIVIDE_TABLE
+//     this replaces small integer divides in the floor decode loop with
+//     table lookups. made less than 1% difference, so disabled by default.
+
+// STB_VORBIS_NO_INLINE_DECODE
+//     disables the inlining of the scalar codebook fast-huffman decode.
+//     might save a little codespace; useful for debugging
+// #define STB_VORBIS_NO_INLINE_DECODE
+
+// STB_VORBIS_NO_DEFER_FLOOR
+//     Normally we only decode the floor without synthesizing the actual
+//     full curve. We can instead synthesize the curve immediately. This
+//     requires more memory and is very likely slower, so I don't think
+//     you'd ever want to do it except for debugging.
+// #define STB_VORBIS_NO_DEFER_FLOOR
+
+
+
+
+//////////////////////////////////////////////////////////////////////////////
+
+#ifdef STB_VORBIS_NO_PULLDATA_API
+   #define STB_VORBIS_NO_INTEGER_CONVERSION
+   #define STB_VORBIS_NO_STDIO
+#endif
+
+#if defined(STB_VORBIS_NO_CRT) && !defined(STB_VORBIS_NO_STDIO)
+   #define STB_VORBIS_NO_STDIO 1
+#endif
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+#ifndef STB_VORBIS_NO_FAST_SCALED_FLOAT
+
+   // only need endianness for fast-float-to-int, which we don't
+   // use for pushdata
+
+   #ifndef STB_VORBIS_BIG_ENDIAN
+     #define STB_VORBIS_ENDIAN  0
+   #else
+     #define STB_VORBIS_ENDIAN  1
+   #endif
+
+#endif
+#endif
+
+
+#ifndef STB_VORBIS_NO_STDIO
+#include <stdio.h>
+#endif
+
+#ifndef STB_VORBIS_NO_CRT
+   #include <stdlib.h>
+   #include <string.h>
+   #include <assert.h>
+   #include <math.h>
+
+   // find definition of alloca if it's not in stdlib.h:
+   #if defined(_MSC_VER) || defined(__MINGW32__)
+      #include <malloc.h>
+   #endif
+   #if defined(__linux__) || defined(__linux) || defined(__sun__) || defined(__EMSCRIPTEN__) || defined(__NEWLIB__)
+      #include <alloca.h>
+   #endif
+#else // STB_VORBIS_NO_CRT
+   #define NULL 0
+   #define malloc(s)   0
+   #define free(s)     ((void) 0)
+   #define realloc(s)  0
+#endif // STB_VORBIS_NO_CRT
+
+#include <limits.h>
+
+#ifdef __MINGW32__
+   // eff you mingw:
+   //     "fixed":
+   //         http://sourceforge.net/p/mingw-w64/mailman/message/32882927/
+   //     "no that broke the build, reverted, who cares about C":
+   //         http://sourceforge.net/p/mingw-w64/mailman/message/32890381/
+   #ifdef __forceinline
+   #undef __forceinline
+   #endif
+   #define __forceinline
+   #ifndef alloca
+   #define alloca __builtin_alloca
+   #endif
+#elif !defined(_MSC_VER)
+   #if __GNUC__
+      #define __forceinline inline
+   #else
+      #define __forceinline
+   #endif
+#endif
+
+#if STB_VORBIS_MAX_CHANNELS > 256
+#error "Value of STB_VORBIS_MAX_CHANNELS outside of allowed range"
+#endif
+
+#if STB_VORBIS_FAST_HUFFMAN_LENGTH > 24
+#error "Value of STB_VORBIS_FAST_HUFFMAN_LENGTH outside of allowed range"
+#endif
+
+
+#if 0
+#include <crtdbg.h>
+#define CHECK(f)   _CrtIsValidHeapPointer(f->channel_buffers[1])
+#else
+#define CHECK(f)   ((void) 0)
+#endif
+
+#define MAX_BLOCKSIZE_LOG  13   // from specification
+#define MAX_BLOCKSIZE      (1 << MAX_BLOCKSIZE_LOG)
+
+
+typedef unsigned char  uint8;
+typedef   signed char   int8;
+typedef unsigned short uint16;
+typedef   signed short  int16;
+typedef unsigned int   uint32;
+typedef   signed int    int32;
+
+#ifndef TRUE
+#define TRUE 1
+#define FALSE 0
+#endif
+
+typedef float codetype;
+
+#ifdef _MSC_VER
+#define STBV_NOTUSED(v)  (void)(v)
+#else
+#define STBV_NOTUSED(v)  (void)sizeof(v)
+#endif
+
+// @NOTE
+//
+// Some arrays below are tagged "//varies", which means it's actually
+// a variable-sized piece of data, but rather than malloc I assume it's
+// small enough it's better to just allocate it all together with the
+// main thing
+//
+// Most of the variables are specified with the smallest size I could pack
+// them into. It might give better performance to make them all full-sized
+// integers. It should be safe to freely rearrange the structures or change
+// the sizes larger--nothing relies on silently truncating etc., nor the
+// order of variables.
+
+#define FAST_HUFFMAN_TABLE_SIZE   (1 << STB_VORBIS_FAST_HUFFMAN_LENGTH)
+#define FAST_HUFFMAN_TABLE_MASK   (FAST_HUFFMAN_TABLE_SIZE - 1)
+
+typedef struct
+{
+   int dimensions, entries;
+   uint8 *codeword_lengths;
+   float  minimum_value;
+   float  delta_value;
+   uint8  value_bits;
+   uint8  lookup_type;
+   uint8  sequence_p;
+   uint8  sparse;
+   uint32 lookup_values;
+   codetype *multiplicands;
+   uint32 *codewords;
+   #ifdef STB_VORBIS_FAST_HUFFMAN_SHORT
+    int16  fast_huffman[FAST_HUFFMAN_TABLE_SIZE];
+   #else
+    int32  fast_huffman[FAST_HUFFMAN_TABLE_SIZE];
+   #endif
+   uint32 *sorted_codewords;
+   int    *sorted_values;
+   int     sorted_entries;
+} Codebook;
+
+typedef struct
+{
+   uint8 order;
+   uint16 rate;
+   uint16 bark_map_size;
+   uint8 amplitude_bits;
+   uint8 amplitude_offset;
+   uint8 number_of_books;
+   uint8 book_list[16]; // varies
+} Floor0;
+
+typedef struct
+{
+   uint8 partitions;
+   uint8 partition_class_list[32]; // varies
+   uint8 class_dimensions[16]; // varies
+   uint8 class_subclasses[16]; // varies
+   uint8 class_masterbooks[16]; // varies
+   int16 subclass_books[16][8]; // varies
+   uint16 Xlist[31*8+2]; // varies
+   uint8 sorted_order[31*8+2];
+   uint8 neighbors[31*8+2][2];
+   uint8 floor1_multiplier;
+   uint8 rangebits;
+   int values;
+} Floor1;
+
+typedef union
+{
+   Floor0 floor0;
+   Floor1 floor1;
+} Floor;
+
+typedef struct
+{
+   uint32 begin, end;
+   uint32 part_size;
+   uint8 classifications;
+   uint8 classbook;
+   uint8 **classdata;
+   int16 (*residue_books)[8];
+} Residue;
+
+typedef struct
+{
+   uint8 magnitude;
+   uint8 angle;
+   uint8 mux;
+} MappingChannel;
+
+typedef struct
+{
+   uint16 coupling_steps;
+   MappingChannel *chan;
+   uint8  submaps;
+   uint8  submap_floor[15]; // varies
+   uint8  submap_residue[15]; // varies
+} Mapping;
+
+typedef struct
+{
+   uint8 blockflag;
+   uint8 mapping;
+   uint16 windowtype;
+   uint16 transformtype;
+} Mode;
+
+typedef struct
+{
+   uint32  goal_crc;    // expected crc if match
+   int     bytes_left;  // bytes left in packet
+   uint32  crc_so_far;  // running crc
+   int     bytes_done;  // bytes processed in _current_ chunk
+   uint32  sample_loc;  // granule pos encoded in page
+} CRCscan;
+
+typedef struct
+{
+   uint32 page_start, page_end;
+   uint32 last_decoded_sample;
+} ProbedPage;
+
+struct stb_vorbis
+{
+  // user-accessible info
+   unsigned int sample_rate;
+   int channels;
+
+   unsigned int setup_memory_required;
+   unsigned int temp_memory_required;
+   unsigned int setup_temp_memory_required;
+
+   char *vendor;
+   int comment_list_length;
+   char **comment_list;
+
+  // input config
+#ifndef STB_VORBIS_NO_STDIO
+   FILE *f;
+   uint32 f_start;
+   int close_on_free;
+#endif
+
+   uint8 *stream;
+   uint8 *stream_start;
+   uint8 *stream_end;
+
+   uint32 stream_len;
+
+   uint8  push_mode;
+
+   // the page to seek to when seeking to start, may be zero
+   uint32 first_audio_page_offset;
+
+   // p_first is the page on which the first audio packet ends
+   // (but not necessarily the page on which it starts)
+   ProbedPage p_first, p_last;
+
+  // memory management
+   stb_vorbis_alloc alloc;
+   int setup_offset;
+   int temp_offset;
+
+  // run-time results
+   int eof;
+   enum STBVorbisError error;
+
+  // user-useful data
+
+  // header info
+   int blocksize[2];
+   int blocksize_0, blocksize_1;
+   int codebook_count;
+   Codebook *codebooks;
+   int floor_count;
+   uint16 floor_types[64]; // varies
+   Floor *floor_config;
+   int residue_count;
+   uint16 residue_types[64]; // varies
+   Residue *residue_config;
+   int mapping_count;
+   Mapping *mapping;
+   int mode_count;
+   Mode mode_config[64];  // varies
+
+   uint32 total_samples;
+
+  // decode buffer
+   float *channel_buffers[STB_VORBIS_MAX_CHANNELS];
+   float *outputs        [STB_VORBIS_MAX_CHANNELS];
+
+   float *previous_window[STB_VORBIS_MAX_CHANNELS];
+   int previous_length;
+
+   #ifndef STB_VORBIS_NO_DEFER_FLOOR
+   int16 *finalY[STB_VORBIS_MAX_CHANNELS];
+   #else
+   float *floor_buffers[STB_VORBIS_MAX_CHANNELS];
+   #endif
+
+   uint32 current_loc; // sample location of next frame to decode
+   int    current_loc_valid;
+
+  // per-blocksize precomputed data
+
+   // twiddle factors
+   float *A[2],*B[2],*C[2];
+   float *window[2];
+   uint16 *bit_reverse[2];
+
+  // current page/packet/segment streaming info
+   uint32 serial; // stream serial number for verification
+   int last_page;
+   int segment_count;
+   uint8 segments[255];
+   uint8 page_flag;
+   uint8 bytes_in_seg;
+   uint8 first_decode;
+   int next_seg;
+   int last_seg;  // flag that we're on the last segment
+   int last_seg_which; // what was the segment number of the last seg?
+   uint32 acc;
+   int valid_bits;
+   int packet_bytes;
+   int end_seg_with_known_loc;
+   uint32 known_loc_for_packet;
+   int discard_samples_deferred;
+   uint32 samples_output;
+
+  // push mode scanning
+   int page_crc_tests; // only in push_mode: number of tests active; -1 if not searching
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+   CRCscan scan[STB_VORBIS_PUSHDATA_CRC_COUNT];
+#endif
+
+  // sample-access
+   int channel_buffer_start;
+   int channel_buffer_end;
+};
+
+#if defined(STB_VORBIS_NO_PUSHDATA_API)
+   #define IS_PUSH_MODE(f)   FALSE
+#elif defined(STB_VORBIS_NO_PULLDATA_API)
+   #define IS_PUSH_MODE(f)   TRUE
+#else
+   #define IS_PUSH_MODE(f)   ((f)->push_mode)
+#endif
+
+typedef struct stb_vorbis vorb;
+
+static int error(vorb *f, enum STBVorbisError e)
+{
+   f->error = e;
+   if (!f->eof && e != VORBIS_need_more_data) {
+      f->error=e; // breakpoint for debugging
+   }
+   return 0;
+}
+
+
+// these functions are used for allocating temporary memory
+// while decoding. if you can afford the stack space, use
+// alloca(); otherwise, provide a temp buffer and it will
+// allocate out of those.
+
+#define array_size_required(count,size)  (count*(sizeof(void *)+(size)))
+
+#define temp_alloc(f,size)              (f->alloc.alloc_buffer ? setup_temp_malloc(f,size) : alloca(size))
+#define temp_free(f,p)                  (void)0
+#define temp_alloc_save(f)              ((f)->temp_offset)
+#define temp_alloc_restore(f,p)         ((f)->temp_offset = (p))
+
+#define temp_block_array(f,count,size)  make_block_array(temp_alloc(f,array_size_required(count,size)), count, size)
+
+// given a sufficiently large block of memory, make an array of pointers to subblocks of it
+static void *make_block_array(void *mem, int count, int size)
+{
+   int i;
+   void ** p = (void **) mem;
+   char *q = (char *) (p + count);
+   for (i=0; i < count; ++i) {
+      p[i] = q;
+      q += size;
+   }
+   return p;
+}
+
+static void *setup_malloc(vorb *f, int sz)
+{
+   sz = (sz+7) & ~7; // round up to nearest 8 for alignment of future allocs.
+   f->setup_memory_required += sz;
+   if (f->alloc.alloc_buffer) {
+      void *p = (char *) f->alloc.alloc_buffer + f->setup_offset;
+      if (f->setup_offset + sz > f->temp_offset) return NULL;
+      f->setup_offset += sz;
+      return p;
+   }
+   return sz ? malloc(sz) : NULL;
+}
+
+static void setup_free(vorb *f, void *p)
+{
+   if (f->alloc.alloc_buffer) return; // do nothing; setup mem is a stack
+   free(p);
+}
+
+static void *setup_temp_malloc(vorb *f, int sz)
+{
+   sz = (sz+7) & ~7; // round up to nearest 8 for alignment of future allocs.
+   if (f->alloc.alloc_buffer) {
+      if (f->temp_offset - sz < f->setup_offset) return NULL;
+      f->temp_offset -= sz;
+      return (char *) f->alloc.alloc_buffer + f->temp_offset;
+   }
+   return malloc(sz);
+}
+
+static void setup_temp_free(vorb *f, void *p, int sz)
+{
+   if (f->alloc.alloc_buffer) {
+      f->temp_offset += (sz+7)&~7;
+      return;
+   }
+   free(p);
+}
+
+#define CRC32_POLY    0x04c11db7   // from spec
+
+static uint32 crc_table[256];
+static void crc32_init(void)
+{
+   int i,j;
+   uint32 s;
+   for(i=0; i < 256; i++) {
+      for (s=(uint32) i << 24, j=0; j < 8; ++j)
+         s = (s << 1) ^ (s >= (1U<<31) ? CRC32_POLY : 0);
+      crc_table[i] = s;
+   }
+}
+
+static __forceinline uint32 crc32_update(uint32 crc, uint8 byte)
+{
+   return (crc << 8) ^ crc_table[byte ^ (crc >> 24)];
+}
+
+
+// used in setup, and for huffman that doesn't go fast path
+static unsigned int bit_reverse(unsigned int n)
+{
+  n = ((n & 0xAAAAAAAA) >>  1) | ((n & 0x55555555) << 1);
+  n = ((n & 0xCCCCCCCC) >>  2) | ((n & 0x33333333) << 2);
+  n = ((n & 0xF0F0F0F0) >>  4) | ((n & 0x0F0F0F0F) << 4);
+  n = ((n & 0xFF00FF00) >>  8) | ((n & 0x00FF00FF) << 8);
+  return (n >> 16) | (n << 16);
+}
+
+static float square(float x)
+{
+   return x*x;
+}
+
+// this is a weird definition of log2() for which log2(1) = 1, log2(2) = 2, log2(4) = 3
+// as required by the specification. fast(?) implementation from stb.h
+// @OPTIMIZE: called multiple times per-packet with "constants"; move to setup
+static int ilog(int32 n)
+{
+   static signed char log2_4[16] = { 0,1,2,2,3,3,3,3,4,4,4,4,4,4,4,4 };
+
+   if (n < 0) return 0; // signed n returns 0
+
+   // 2 compares if n < 16, 3 compares otherwise (4 if signed or n > 1<<29)
+   if (n < (1 << 14))
+        if (n < (1 <<  4))            return  0 + log2_4[n      ];
+        else if (n < (1 <<  9))       return  5 + log2_4[n >>  5];
+             else                     return 10 + log2_4[n >> 10];
+   else if (n < (1 << 24))
+             if (n < (1 << 19))       return 15 + log2_4[n >> 15];
+             else                     return 20 + log2_4[n >> 20];
+        else if (n < (1 << 29))       return 25 + log2_4[n >> 25];
+             else                     return 30 + log2_4[n >> 30];
+}
+
+#ifndef M_PI
+  #define M_PI  3.14159265358979323846264f  // from CRC
+#endif
+
+// code length assigned to a value with no huffman encoding
+#define NO_CODE   255
+
+/////////////////////// LEAF SETUP FUNCTIONS //////////////////////////
+//
+// these functions are only called at setup, and only a few times
+// per file
+
+static float float32_unpack(uint32 x)
+{
+   // from the specification
+   uint32 mantissa = x & 0x1fffff;
+   uint32 sign = x & 0x80000000;
+   uint32 exp = (x & 0x7fe00000) >> 21;
+   double res = sign ? -(double)mantissa : (double)mantissa;
+   return (float) ldexp((float)res, (int)exp-788);
+}
+
+
+// zlib & jpeg huffman tables assume that the output symbols
+// can either be arbitrarily arranged, or have monotonically
+// increasing frequencies--they rely on the lengths being sorted;
+// this makes for a very simple generation algorithm.
+// vorbis allows a huffman table with non-sorted lengths. This
+// requires a more sophisticated construction, since symbols in
+// order do not map to huffman codes "in order".
+static void add_entry(Codebook *c, uint32 huff_code, int symbol, int count, int len, uint32 *values)
+{
+   if (!c->sparse) {
+      c->codewords      [symbol] = huff_code;
+   } else {
+      c->codewords       [count] = huff_code;
+      c->codeword_lengths[count] = len;
+      values             [count] = symbol;
+   }
+}
+
+static int compute_codewords(Codebook *c, uint8 *len, int n, uint32 *values)
+{
+   int i,k,m=0;
+   uint32 available[32];
+
+   memset(available, 0, sizeof(available));
+   // find the first entry
+   for (k=0; k < n; ++k) if (len[k] < NO_CODE) break;
+   if (k == n) { assert(c->sorted_entries == 0); return TRUE; }
+   assert(len[k] < 32); // no error return required, code reading lens checks this
+   // add to the list
+   add_entry(c, 0, k, m++, len[k], values);
+   // add all available leaves
+   for (i=1; i <= len[k]; ++i)
+      available[i] = 1U << (32-i);
+   // note that the above code treats the first case specially,
+   // but it's really the same as the following code, so they
+   // could probably be combined (except the initial code is 0,
+   // and I use 0 in available[] to mean 'empty')
+   for (i=k+1; i < n; ++i) {
+      uint32 res;
+      int z = len[i], y;
+      if (z == NO_CODE) continue;
+      assert(z < 32); // no error return required, code reading lens checks this
+      // find lowest available leaf (should always be earliest,
+      // which is what the specification calls for)
+      // note that this property, and the fact we can never have
+      // more than one free leaf at a given level, isn't totally
+      // trivial to prove, but it seems true and the assert never
+      // fires, so!
+      while (z > 0 && !available[z]) --z;
+      if (z == 0) { return FALSE; }
+      res = available[z];
+      available[z] = 0;
+      add_entry(c, bit_reverse(res), i, m++, len[i], values);
+      // propagate availability up the tree
+      if (z != len[i]) {
+         for (y=len[i]; y > z; --y) {
+            assert(available[y] == 0);
+            available[y] = res + (1 << (32-y));
+         }
+      }
+   }
+   return TRUE;
+}
+
+// accelerated huffman table allows fast O(1) match of all symbols
+// of length <= STB_VORBIS_FAST_HUFFMAN_LENGTH
+static void compute_accelerated_huffman(Codebook *c)
+{
+   int i, len;
+   for (i=0; i < FAST_HUFFMAN_TABLE_SIZE; ++i)
+      c->fast_huffman[i] = -1;
+
+   len = c->sparse ? c->sorted_entries : c->entries;
+   #ifdef STB_VORBIS_FAST_HUFFMAN_SHORT
+   if (len > 32767) len = 32767; // largest possible value we can encode!
+   #endif
+   for (i=0; i < len; ++i) {
+      if (c->codeword_lengths[i] <= STB_VORBIS_FAST_HUFFMAN_LENGTH) {
+         uint32 z = c->sparse ? bit_reverse(c->sorted_codewords[i]) : c->codewords[i];
+         // set table entries for all bit combinations in the higher bits
+         while (z < FAST_HUFFMAN_TABLE_SIZE) {
+             c->fast_huffman[z] = i;
+             z += 1 << c->codeword_lengths[i];
+         }
+      }
+   }
+}
+
+#ifdef _MSC_VER
+#define STBV_CDECL __cdecl
+#else
+#define STBV_CDECL
+#endif
+
+static int STBV_CDECL uint32_compare(const void *p, const void *q)
+{
+   uint32 x = * (uint32 *) p;
+   uint32 y = * (uint32 *) q;
+   return x < y ? -1 : x > y;
+}
+
+static int include_in_sort(Codebook *c, uint8 len)
+{
+   if (c->sparse) { assert(len != NO_CODE); return TRUE; }
+   if (len == NO_CODE) return FALSE;
+   if (len > STB_VORBIS_FAST_HUFFMAN_LENGTH) return TRUE;
+   return FALSE;
+}
+
+// if the fast table above doesn't work, we want to binary
+// search them... need to reverse the bits
+static void compute_sorted_huffman(Codebook *c, uint8 *lengths, uint32 *values)
+{
+   int i, len;
+   // build a list of all the entries
+   // OPTIMIZATION: don't include the short ones, since they'll be caught by FAST_HUFFMAN.
+   // this is kind of a frivolous optimization--I don't see any performance improvement,
+   // but it's like 4 extra lines of code, so.
+   if (!c->sparse) {
+      int k = 0;
+      for (i=0; i < c->entries; ++i)
+         if (include_in_sort(c, lengths[i]))
+            c->sorted_codewords[k++] = bit_reverse(c->codewords[i]);
+      assert(k == c->sorted_entries);
+   } else {
+      for (i=0; i < c->sorted_entries; ++i)
+         c->sorted_codewords[i] = bit_reverse(c->codewords[i]);
+   }
+
+   qsort(c->sorted_codewords, c->sorted_entries, sizeof(c->sorted_codewords[0]), uint32_compare);
+   c->sorted_codewords[c->sorted_entries] = 0xffffffff;
+
+   len = c->sparse ? c->sorted_entries : c->entries;
+   // now we need to indicate how they correspond; we could either
+   //   #1: sort a different data structure that says who they correspond to
+   //   #2: for each sorted entry, search the original list to find who corresponds
+   //   #3: for each original entry, find the sorted entry
+   // #1 requires extra storage, #2 is slow, #3 can use binary search!
+   for (i=0; i < len; ++i) {
+      int huff_len = c->sparse ? lengths[values[i]] : lengths[i];
+      if (include_in_sort(c,huff_len)) {
+         uint32 code = bit_reverse(c->codewords[i]);
+         int x=0, n=c->sorted_entries;
+         while (n > 1) {
+            // invariant: sc[x] <= code < sc[x+n]
+            int m = x + (n >> 1);
+            if (c->sorted_codewords[m] <= code) {
+               x = m;
+               n -= (n>>1);
+            } else {
+               n >>= 1;
+            }
+         }
+         assert(c->sorted_codewords[x] == code);
+         if (c->sparse) {
+            c->sorted_values[x] = values[i];
+            c->codeword_lengths[x] = huff_len;
+         } else {
+            c->sorted_values[x] = i;
+         }
+      }
+   }
+}
+
+// only run while parsing the header (3 times)
+static int vorbis_validate(uint8 *data)
+{
+   static uint8 vorbis[6] = { 'v', 'o', 'r', 'b', 'i', 's' };
+   return memcmp(data, vorbis, 6) == 0;
+}
+
+// called from setup only, once per code book
+// (formula implied by specification)
+static int lookup1_values(int entries, int dim)
+{
+   int r = (int) floor(exp((float) log((float) entries) / dim));
+   if ((int) floor(pow((float) r+1, dim)) <= entries)   // (int) cast for MinGW warning;
+      ++r;                                              // floor() to avoid _ftol() when non-CRT
+   if (pow((float) r+1, dim) <= entries)
+      return -1;
+   if ((int) floor(pow((float) r, dim)) > entries)
+      return -1;
+   return r;
+}
+
+// called twice per file
+static void compute_twiddle_factors(int n, float *A, float *B, float *C)
+{
+   int n4 = n >> 2, n8 = n >> 3;
+   int k,k2;
+
+   for (k=k2=0; k < n4; ++k,k2+=2) {
+      A[k2  ] = (float)  cos(4*k*M_PI/n);
+      A[k2+1] = (float) -sin(4*k*M_PI/n);
+      B[k2  ] = (float)  cos((k2+1)*M_PI/n/2) * 0.5f;
+      B[k2+1] = (float)  sin((k2+1)*M_PI/n/2) * 0.5f;
+   }
+   for (k=k2=0; k < n8; ++k,k2+=2) {
+      C[k2  ] = (float)  cos(2*(k2+1)*M_PI/n);
+      C[k2+1] = (float) -sin(2*(k2+1)*M_PI/n);
+   }
+}
+
+static void compute_window(int n, float *window)
+{
+   int n2 = n >> 1, i;
+   for (i=0; i < n2; ++i)
+      window[i] = (float) sin(0.5 * M_PI * square((float) sin((i - 0 + 0.5) / n2 * 0.5 * M_PI)));
+}
+
+static void compute_bitreverse(int n, uint16 *rev)
+{
+   int ld = ilog(n) - 1; // ilog is off-by-one from normal definitions
+   int i, n8 = n >> 3;
+   for (i=0; i < n8; ++i)
+      rev[i] = (bit_reverse(i) >> (32-ld+3)) << 2;
+}
+
+static int init_blocksize(vorb *f, int b, int n)
+{
+   int n2 = n >> 1, n4 = n >> 2, n8 = n >> 3;
+   f->A[b] = (float *) setup_malloc(f, sizeof(float) * n2);
+   f->B[b] = (float *) setup_malloc(f, sizeof(float) * n2);
+   f->C[b] = (float *) setup_malloc(f, sizeof(float) * n4);
+   if (!f->A[b] || !f->B[b] || !f->C[b]) return error(f, VORBIS_outofmem);
+   compute_twiddle_factors(n, f->A[b], f->B[b], f->C[b]);
+   f->window[b] = (float *) setup_malloc(f, sizeof(float) * n2);
+   if (!f->window[b]) return error(f, VORBIS_outofmem);
+   compute_window(n, f->window[b]);
+   f->bit_reverse[b] = (uint16 *) setup_malloc(f, sizeof(uint16) * n8);
+   if (!f->bit_reverse[b]) return error(f, VORBIS_outofmem);
+   compute_bitreverse(n, f->bit_reverse[b]);
+   return TRUE;
+}
+
+static void neighbors(uint16 *x, int n, int *plow, int *phigh)
+{
+   int low = -1;
+   int high = 65536;
+   int i;
+   for (i=0; i < n; ++i) {
+      if (x[i] > low  && x[i] < x[n]) { *plow  = i; low = x[i]; }
+      if (x[i] < high && x[i] > x[n]) { *phigh = i; high = x[i]; }
+   }
+}
+
+// this has been repurposed so y is now the original index instead of y
+typedef struct
+{
+   uint16 x,id;
+} stbv__floor_ordering;
+
+static int STBV_CDECL point_compare(const void *p, const void *q)
+{
+   stbv__floor_ordering *a = (stbv__floor_ordering *) p;
+   stbv__floor_ordering *b = (stbv__floor_ordering *) q;
+   return a->x < b->x ? -1 : a->x > b->x;
+}
+
+//
+/////////////////////// END LEAF SETUP FUNCTIONS //////////////////////////
+
+
+#if defined(STB_VORBIS_NO_STDIO)
+   #define USE_MEMORY(z)    TRUE
+#else
+   #define USE_MEMORY(z)    ((z)->stream)
+#endif
+
+static uint8 get8(vorb *z)
+{
+   if (USE_MEMORY(z)) {
+      if (z->stream >= z->stream_end) { z->eof = TRUE; return 0; }
+      return *z->stream++;
+   }
+
+   #ifndef STB_VORBIS_NO_STDIO
+   {
+   int c = fgetc(z->f);
+   if (c == EOF) { z->eof = TRUE; return 0; }
+   return c;
+   }
+   #endif
+}
+
+static uint32 get32(vorb *f)
+{
+   uint32 x;
+   x = get8(f);
+   x += get8(f) << 8;
+   x += get8(f) << 16;
+   x += (uint32) get8(f) << 24;
+   return x;
+}
+
+static int getn(vorb *z, uint8 *data, int n)
+{
+   if (USE_MEMORY(z)) {
+      if (z->stream+n > z->stream_end) { z->eof = 1; return 0; }
+      memcpy(data, z->stream, n);
+      z->stream += n;
+      return 1;
+   }
+
+   #ifndef STB_VORBIS_NO_STDIO
+   if (fread(data, n, 1, z->f) == 1)
+      return 1;
+   else {
+      z->eof = 1;
+      return 0;
+   }
+   #endif
+}
+
+static void skip(vorb *z, int n)
+{
+   if (USE_MEMORY(z)) {
+      z->stream += n;
+      if (z->stream >= z->stream_end) z->eof = 1;
+      return;
+   }
+   #ifndef STB_VORBIS_NO_STDIO
+   {
+      long x = ftell(z->f);
+      fseek(z->f, x+n, SEEK_SET);
+   }
+   #endif
+}
+
+static int set_file_offset(stb_vorbis *f, unsigned int loc)
+{
+   #ifndef STB_VORBIS_NO_PUSHDATA_API
+   if (f->push_mode) return 0;
+   #endif
+   f->eof = 0;
+   if (USE_MEMORY(f)) {
+      if (f->stream_start + loc >= f->stream_end || f->stream_start + loc < f->stream_start) {
+         f->stream = f->stream_end;
+         f->eof = 1;
+         return 0;
+      } else {
+         f->stream = f->stream_start + loc;
+         return 1;
+      }
+   }
+   #ifndef STB_VORBIS_NO_STDIO
+   if (loc + f->f_start < loc || loc >= 0x80000000) {
+      loc = 0x7fffffff;
+      f->eof = 1;
+   } else {
+      loc += f->f_start;
+   }
+   if (!fseek(f->f, loc, SEEK_SET))
+      return 1;
+   f->eof = 1;
+   fseek(f->f, f->f_start, SEEK_END);
+   return 0;
+   #endif
+}
+
+
+static uint8 ogg_page_header[4] = { 0x4f, 0x67, 0x67, 0x53 };
+
+static int capture_pattern(vorb *f)
+{
+   if (0x4f != get8(f)) return FALSE;
+   if (0x67 != get8(f)) return FALSE;
+   if (0x67 != get8(f)) return FALSE;
+   if (0x53 != get8(f)) return FALSE;
+   return TRUE;
+}
+
+#define PAGEFLAG_continued_packet   1
+#define PAGEFLAG_first_page         2
+#define PAGEFLAG_last_page          4
+
+static int start_page_no_capturepattern(vorb *f)
+{
+   uint32 loc0,loc1,n;
+   if (f->first_decode && !IS_PUSH_MODE(f)) {
+      f->p_first.page_start = stb_vorbis_get_file_offset(f) - 4;
+   }
+   // stream structure version
+   if (0 != get8(f)) return error(f, VORBIS_invalid_stream_structure_version);
+   // header flag
+   f->page_flag = get8(f);
+   // absolute granule position
+   loc0 = get32(f);
+   loc1 = get32(f);
+   // @TODO: validate loc0,loc1 as valid positions?
+   // stream serial number -- vorbis doesn't interleave, so discard
+   get32(f);
+   //if (f->serial != get32(f)) return error(f, VORBIS_incorrect_stream_serial_number);
+   // page sequence number
+   n = get32(f);
+   f->last_page = n;
+   // CRC32
+   get32(f);
+   // page_segments
+   f->segment_count = get8(f);
+   if (!getn(f, f->segments, f->segment_count))
+      return error(f, VORBIS_unexpected_eof);
+   // assume we _don't_ know any the sample position of any segments
+   f->end_seg_with_known_loc = -2;
+   if (loc0 != ~0U || loc1 != ~0U) {
+      int i;
+      // determine which packet is the last one that will complete
+      for (i=f->segment_count-1; i >= 0; --i)
+         if (f->segments[i] < 255)
+            break;
+      // 'i' is now the index of the _last_ segment of a packet that ends
+      if (i >= 0) {
+         f->end_seg_with_known_loc = i;
+         f->known_loc_for_packet   = loc0;
+      }
+   }
+   if (f->first_decode) {
+      int i,len;
+      len = 0;
+      for (i=0; i < f->segment_count; ++i)
+         len += f->segments[i];
+      len += 27 + f->segment_count;
+      f->p_first.page_end = f->p_first.page_start + len;
+      f->p_first.last_decoded_sample = loc0;
+   }
+   f->next_seg = 0;
+   return TRUE;
+}
+
+static int start_page(vorb *f)
+{
+   if (!capture_pattern(f)) return error(f, VORBIS_missing_capture_pattern);
+   return start_page_no_capturepattern(f);
+}
+
+static int start_packet(vorb *f)
+{
+   while (f->next_seg == -1) {
+      if (!start_page(f)) return FALSE;
+      if (f->page_flag & PAGEFLAG_continued_packet)
+         return error(f, VORBIS_continued_packet_flag_invalid);
+   }
+   f->last_seg = FALSE;
+   f->valid_bits = 0;
+   f->packet_bytes = 0;
+   f->bytes_in_seg = 0;
+   // f->next_seg is now valid
+   return TRUE;
+}
+
+static int maybe_start_packet(vorb *f)
+{
+   if (f->next_seg == -1) {
+      int x = get8(f);
+      if (f->eof) return FALSE; // EOF at page boundary is not an error!
+      if (0x4f != x      ) return error(f, VORBIS_missing_capture_pattern);
+      if (0x67 != get8(f)) return error(f, VORBIS_missing_capture_pattern);
+      if (0x67 != get8(f)) return error(f, VORBIS_missing_capture_pattern);
+      if (0x53 != get8(f)) return error(f, VORBIS_missing_capture_pattern);
+      if (!start_page_no_capturepattern(f)) return FALSE;
+      if (f->page_flag & PAGEFLAG_continued_packet) {
+         // set up enough state that we can read this packet if we want,
+         // e.g. during recovery
+         f->last_seg = FALSE;
+         f->bytes_in_seg = 0;
+         return error(f, VORBIS_continued_packet_flag_invalid);
+      }
+   }
+   return start_packet(f);
+}
+
+static int next_segment(vorb *f)
+{
+   int len;
+   if (f->last_seg) return 0;
+   if (f->next_seg == -1) {
+      f->last_seg_which = f->segment_count-1; // in case start_page fails
+      if (!start_page(f)) { f->last_seg = 1; return 0; }
+      if (!(f->page_flag & PAGEFLAG_continued_packet)) return error(f, VORBIS_continued_packet_flag_invalid);
+   }
+   len = f->segments[f->next_seg++];
+   if (len < 255) {
+      f->last_seg = TRUE;
+      f->last_seg_which = f->next_seg-1;
+   }
+   if (f->next_seg >= f->segment_count)
+      f->next_seg = -1;
+   assert(f->bytes_in_seg == 0);
+   f->bytes_in_seg = len;
+   return len;
+}
+
+#define EOP    (-1)
+#define INVALID_BITS  (-1)
+
+static int get8_packet_raw(vorb *f)
+{
+   if (!f->bytes_in_seg) {  // CLANG!
+      if (f->last_seg) return EOP;
+      else if (!next_segment(f)) return EOP;
+   }
+   assert(f->bytes_in_seg > 0);
+   --f->bytes_in_seg;
+   ++f->packet_bytes;
+   return get8(f);
+}
+
+static int get8_packet(vorb *f)
+{
+   int x = get8_packet_raw(f);
+   f->valid_bits = 0;
+   return x;
+}
+
+static int get32_packet(vorb *f)
+{
+   uint32 x;
+   x = get8_packet(f);
+   x += get8_packet(f) << 8;
+   x += get8_packet(f) << 16;
+   x += (uint32) get8_packet(f) << 24;
+   return x;
+}
+
+static void flush_packet(vorb *f)
+{
+   while (get8_packet_raw(f) != EOP);
+}
+
+// @OPTIMIZE: this is the secondary bit decoder, so it's probably not as important
+// as the huffman decoder?
+static uint32 get_bits(vorb *f, int n)
+{
+   uint32 z;
+
+   if (f->valid_bits < 0) return 0;
+   if (f->valid_bits < n) {
+      if (n > 24) {
+         // the accumulator technique below would not work correctly in this case
+         z = get_bits(f, 24);
+         z += get_bits(f, n-24) << 24;
+         return z;
+      }
+      if (f->valid_bits == 0) f->acc = 0;
+      while (f->valid_bits < n) {
+         int z = get8_packet_raw(f);
+         if (z == EOP) {
+            f->valid_bits = INVALID_BITS;
+            return 0;
+         }
+         f->acc += z << f->valid_bits;
+         f->valid_bits += 8;
+      }
+   }
+
+   assert(f->valid_bits >= n);
+   z = f->acc & ((1 << n)-1);
+   f->acc >>= n;
+   f->valid_bits -= n;
+   return z;
+}
+
+// @OPTIMIZE: primary accumulator for huffman
+// expand the buffer to as many bits as possible without reading off end of packet
+// it might be nice to allow f->valid_bits and f->acc to be stored in registers,
+// e.g. cache them locally and decode locally
+static __forceinline void prep_huffman(vorb *f)
+{
+   if (f->valid_bits <= 24) {
+      if (f->valid_bits == 0) f->acc = 0;
+      do {
+         int z;
+         if (f->last_seg && !f->bytes_in_seg) return;
+         z = get8_packet_raw(f);
+         if (z == EOP) return;
+         f->acc += (unsigned) z << f->valid_bits;
+         f->valid_bits += 8;
+      } while (f->valid_bits <= 24);
+   }
+}
+
+enum
+{
+   VORBIS_packet_id = 1,
+   VORBIS_packet_comment = 3,
+   VORBIS_packet_setup = 5
+};
+
+static int codebook_decode_scalar_raw(vorb *f, Codebook *c)
+{
+   int i;
+   prep_huffman(f);
+
+   if (c->codewords == NULL && c->sorted_codewords == NULL)
+      return -1;
+
+   // cases to use binary search: sorted_codewords && !c->codewords
+   //                             sorted_codewords && c->entries > 8
+   if (c->entries > 8 ? c->sorted_codewords!=NULL : !c->codewords) {
+      // binary search
+      uint32 code = bit_reverse(f->acc);
+      int x=0, n=c->sorted_entries, len;
+
+      while (n > 1) {
+         // invariant: sc[x] <= code < sc[x+n]
+         int m = x + (n >> 1);
+         if (c->sorted_codewords[m] <= code) {
+            x = m;
+            n -= (n>>1);
+         } else {
+            n >>= 1;
+         }
+      }
+      // x is now the sorted index
+      if (!c->sparse) x = c->sorted_values[x];
+      // x is now sorted index if sparse, or symbol otherwise
+      len = c->codeword_lengths[x];
+      if (f->valid_bits >= len) {
+         f->acc >>= len;
+         f->valid_bits -= len;
+         return x;
+      }
+
+      f->valid_bits = 0;
+      return -1;
+   }
+
+   // if small, linear search
+   assert(!c->sparse);
+   for (i=0; i < c->entries; ++i) {
+      if (c->codeword_lengths[i] == NO_CODE) continue;
+      if (c->codewords[i] == (f->acc & ((1 << c->codeword_lengths[i])-1))) {
+         if (f->valid_bits >= c->codeword_lengths[i]) {
+            f->acc >>= c->codeword_lengths[i];
+            f->valid_bits -= c->codeword_lengths[i];
+            return i;
+         }
+         f->valid_bits = 0;
+         return -1;
+      }
+   }
+
+   error(f, VORBIS_invalid_stream);
+   f->valid_bits = 0;
+   return -1;
+}
+
+#ifndef STB_VORBIS_NO_INLINE_DECODE
+
+#define DECODE_RAW(var, f,c)                                  \
+   if (f->valid_bits < STB_VORBIS_FAST_HUFFMAN_LENGTH)        \
+      prep_huffman(f);                                        \
+   var = f->acc & FAST_HUFFMAN_TABLE_MASK;                    \
+   var = c->fast_huffman[var];                                \
+   if (var >= 0) {                                            \
+      int n = c->codeword_lengths[var];                       \
+      f->acc >>= n;                                           \
+      f->valid_bits -= n;                                     \
+      if (f->valid_bits < 0) { f->valid_bits = 0; var = -1; } \
+   } else {                                                   \
+      var = codebook_decode_scalar_raw(f,c);                  \
+   }
+
+#else
+
+static int codebook_decode_scalar(vorb *f, Codebook *c)
+{
+   int i;
+   if (f->valid_bits < STB_VORBIS_FAST_HUFFMAN_LENGTH)
+      prep_huffman(f);
+   // fast huffman table lookup
+   i = f->acc & FAST_HUFFMAN_TABLE_MASK;
+   i = c->fast_huffman[i];
+   if (i >= 0) {
+      f->acc >>= c->codeword_lengths[i];
+      f->valid_bits -= c->codeword_lengths[i];
+      if (f->valid_bits < 0) { f->valid_bits = 0; return -1; }
+      return i;
+   }
+   return codebook_decode_scalar_raw(f,c);
+}
+
+#define DECODE_RAW(var,f,c)    var = codebook_decode_scalar(f,c);
+
+#endif
+
+#define DECODE(var,f,c)                                       \
+   DECODE_RAW(var,f,c)                                        \
+   if (c->sparse) var = c->sorted_values[var];
+
+#ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+  #define DECODE_VQ(var,f,c)   DECODE_RAW(var,f,c)
+#else
+  #define DECODE_VQ(var,f,c)   DECODE(var,f,c)
+#endif
+
+
+
+
+
+
+// CODEBOOK_ELEMENT_FAST is an optimization for the CODEBOOK_FLOATS case
+// where we avoid one addition
+#define CODEBOOK_ELEMENT(c,off)          (c->multiplicands[off])
+#define CODEBOOK_ELEMENT_FAST(c,off)     (c->multiplicands[off])
+#define CODEBOOK_ELEMENT_BASE(c)         (0)
+
+static int codebook_decode_start(vorb *f, Codebook *c)
+{
+   int z = -1;
+
+   // type 0 is only legal in a scalar context
+   if (c->lookup_type == 0)
+      error(f, VORBIS_invalid_stream);
+   else {
+      DECODE_VQ(z,f,c);
+      if (c->sparse) assert(z < c->sorted_entries);
+      if (z < 0) {  // check for EOP
+         if (!f->bytes_in_seg)
+            if (f->last_seg)
+               return z;
+         error(f, VORBIS_invalid_stream);
+      }
+   }
+   return z;
+}
+
+static int codebook_decode(vorb *f, Codebook *c, float *output, int len)
+{
+   int i,z = codebook_decode_start(f,c);
+   if (z < 0) return FALSE;
+   if (len > c->dimensions) len = c->dimensions;
+
+#ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+   if (c->lookup_type == 1) {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      int div = 1;
+      for (i=0; i < len; ++i) {
+         int off = (z / div) % c->lookup_values;
+         float val = CODEBOOK_ELEMENT_FAST(c,off) + last;
+         output[i] += val;
+         if (c->sequence_p) last = val + c->minimum_value;
+         div *= c->lookup_values;
+      }
+      return TRUE;
+   }
+#endif
+
+   z *= c->dimensions;
+   if (c->sequence_p) {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      for (i=0; i < len; ++i) {
+         float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+         output[i] += val;
+         last = val + c->minimum_value;
+      }
+   } else {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      for (i=0; i < len; ++i) {
+         output[i] += CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+      }
+   }
+
+   return TRUE;
+}
+
+static int codebook_decode_step(vorb *f, Codebook *c, float *output, int len, int step)
+{
+   int i,z = codebook_decode_start(f,c);
+   float last = CODEBOOK_ELEMENT_BASE(c);
+   if (z < 0) return FALSE;
+   if (len > c->dimensions) len = c->dimensions;
+
+#ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+   if (c->lookup_type == 1) {
+      int div = 1;
+      for (i=0; i < len; ++i) {
+         int off = (z / div) % c->lookup_values;
+         float val = CODEBOOK_ELEMENT_FAST(c,off) + last;
+         output[i*step] += val;
+         if (c->sequence_p) last = val;
+         div *= c->lookup_values;
+      }
+      return TRUE;
+   }
+#endif
+
+   z *= c->dimensions;
+   for (i=0; i < len; ++i) {
+      float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+      output[i*step] += val;
+      if (c->sequence_p) last = val;
+   }
+
+   return TRUE;
+}
+
+static int codebook_decode_deinterleave_repeat(vorb *f, Codebook *c, float **outputs, int ch, int *c_inter_p, int *p_inter_p, int len, int total_decode)
+{
+   int c_inter = *c_inter_p;
+   int p_inter = *p_inter_p;
+   int i,z, effective = c->dimensions;
+
+   // type 0 is only legal in a scalar context
+   if (c->lookup_type == 0)   return error(f, VORBIS_invalid_stream);
+
+   while (total_decode > 0) {
+      float last = CODEBOOK_ELEMENT_BASE(c);
+      DECODE_VQ(z,f,c);
+      #ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+      assert(!c->sparse || z < c->sorted_entries);
+      #endif
+      if (z < 0) {
+         if (!f->bytes_in_seg)
+            if (f->last_seg) return FALSE;
+         return error(f, VORBIS_invalid_stream);
+      }
+
+      // if this will take us off the end of the buffers, stop short!
+      // we check by computing the length of the virtual interleaved
+      // buffer (len*ch), our current offset within it (p_inter*ch)+(c_inter),
+      // and the length we'll be using (effective)
+      if (c_inter + p_inter*ch + effective > len * ch) {
+         effective = len*ch - (p_inter*ch - c_inter);
+      }
+
+   #ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+      if (c->lookup_type == 1) {
+         int div = 1;
+         for (i=0; i < effective; ++i) {
+            int off = (z / div) % c->lookup_values;
+            float val = CODEBOOK_ELEMENT_FAST(c,off) + last;
+            if (outputs[c_inter])
+               outputs[c_inter][p_inter] += val;
+            if (++c_inter == ch) { c_inter = 0; ++p_inter; }
+            if (c->sequence_p) last = val;
+            div *= c->lookup_values;
+         }
+      } else
+   #endif
+      {
+         z *= c->dimensions;
+         if (c->sequence_p) {
+            for (i=0; i < effective; ++i) {
+               float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+               if (outputs[c_inter])
+                  outputs[c_inter][p_inter] += val;
+               if (++c_inter == ch) { c_inter = 0; ++p_inter; }
+               last = val;
+            }
+         } else {
+            for (i=0; i < effective; ++i) {
+               float val = CODEBOOK_ELEMENT_FAST(c,z+i) + last;
+               if (outputs[c_inter])
+                  outputs[c_inter][p_inter] += val;
+               if (++c_inter == ch) { c_inter = 0; ++p_inter; }
+            }
+         }
+      }
+
+      total_decode -= effective;
+   }
+   *c_inter_p = c_inter;
+   *p_inter_p = p_inter;
+   return TRUE;
+}
+
+static int predict_point(int x, int x0, int x1, int y0, int y1)
+{
+   int dy = y1 - y0;
+   int adx = x1 - x0;
+   // @OPTIMIZE: force int division to round in the right direction... is this necessary on x86?
+   int err = abs(dy) * (x - x0);
+   int off = err / adx;
+   return dy < 0 ? y0 - off : y0 + off;
+}
+
+// the following table is block-copied from the specification
+static float inverse_db_table[256] =
+{
+  1.0649863e-07f, 1.1341951e-07f, 1.2079015e-07f, 1.2863978e-07f,
+  1.3699951e-07f, 1.4590251e-07f, 1.5538408e-07f, 1.6548181e-07f,
+  1.7623575e-07f, 1.8768855e-07f, 1.9988561e-07f, 2.1287530e-07f,
+  2.2670913e-07f, 2.4144197e-07f, 2.5713223e-07f, 2.7384213e-07f,
+  2.9163793e-07f, 3.1059021e-07f, 3.3077411e-07f, 3.5226968e-07f,
+  3.7516214e-07f, 3.9954229e-07f, 4.2550680e-07f, 4.5315863e-07f,
+  4.8260743e-07f, 5.1396998e-07f, 5.4737065e-07f, 5.8294187e-07f,
+  6.2082472e-07f, 6.6116941e-07f, 7.0413592e-07f, 7.4989464e-07f,
+  7.9862701e-07f, 8.5052630e-07f, 9.0579828e-07f, 9.6466216e-07f,
+  1.0273513e-06f, 1.0941144e-06f, 1.1652161e-06f, 1.2409384e-06f,
+  1.3215816e-06f, 1.4074654e-06f, 1.4989305e-06f, 1.5963394e-06f,
+  1.7000785e-06f, 1.8105592e-06f, 1.9282195e-06f, 2.0535261e-06f,
+  2.1869758e-06f, 2.3290978e-06f, 2.4804557e-06f, 2.6416497e-06f,
+  2.8133190e-06f, 2.9961443e-06f, 3.1908506e-06f, 3.3982101e-06f,
+  3.6190449e-06f, 3.8542308e-06f, 4.1047004e-06f, 4.3714470e-06f,
+  4.6555282e-06f, 4.9580707e-06f, 5.2802740e-06f, 5.6234160e-06f,
+  5.9888572e-06f, 6.3780469e-06f, 6.7925283e-06f, 7.2339451e-06f,
+  7.7040476e-06f, 8.2047000e-06f, 8.7378876e-06f, 9.3057248e-06f,
+  9.9104632e-06f, 1.0554501e-05f, 1.1240392e-05f, 1.1970856e-05f,
+  1.2748789e-05f, 1.3577278e-05f, 1.4459606e-05f, 1.5399272e-05f,
+  1.6400004e-05f, 1.7465768e-05f, 1.8600792e-05f, 1.9809576e-05f,
+  2.1096914e-05f, 2.2467911e-05f, 2.3928002e-05f, 2.5482978e-05f,
+  2.7139006e-05f, 2.8902651e-05f, 3.0780908e-05f, 3.2781225e-05f,
+  3.4911534e-05f, 3.7180282e-05f, 3.9596466e-05f, 4.2169667e-05f,
+  4.4910090e-05f, 4.7828601e-05f, 5.0936773e-05f, 5.4246931e-05f,
+  5.7772202e-05f, 6.1526565e-05f, 6.5524908e-05f, 6.9783085e-05f,
+  7.4317983e-05f, 7.9147585e-05f, 8.4291040e-05f, 8.9768747e-05f,
+  9.5602426e-05f, 0.00010181521f, 0.00010843174f, 0.00011547824f,
+  0.00012298267f, 0.00013097477f, 0.00013948625f, 0.00014855085f,
+  0.00015820453f, 0.00016848555f, 0.00017943469f, 0.00019109536f,
+  0.00020351382f, 0.00021673929f, 0.00023082423f, 0.00024582449f,
+  0.00026179955f, 0.00027881276f, 0.00029693158f, 0.00031622787f,
+  0.00033677814f, 0.00035866388f, 0.00038197188f, 0.00040679456f,
+  0.00043323036f, 0.00046138411f, 0.00049136745f, 0.00052329927f,
+  0.00055730621f, 0.00059352311f, 0.00063209358f, 0.00067317058f,
+  0.00071691700f, 0.00076350630f, 0.00081312324f, 0.00086596457f,
+  0.00092223983f, 0.00098217216f, 0.0010459992f,  0.0011139742f,
+  0.0011863665f,  0.0012634633f,  0.0013455702f,  0.0014330129f,
+  0.0015261382f,  0.0016253153f,  0.0017309374f,  0.0018434235f,
+  0.0019632195f,  0.0020908006f,  0.0022266726f,  0.0023713743f,
+  0.0025254795f,  0.0026895994f,  0.0028643847f,  0.0030505286f,
+  0.0032487691f,  0.0034598925f,  0.0036847358f,  0.0039241906f,
+  0.0041792066f,  0.0044507950f,  0.0047400328f,  0.0050480668f,
+  0.0053761186f,  0.0057254891f,  0.0060975636f,  0.0064938176f,
+  0.0069158225f,  0.0073652516f,  0.0078438871f,  0.0083536271f,
+  0.0088964928f,  0.009474637f,   0.010090352f,   0.010746080f,
+  0.011444421f,   0.012188144f,   0.012980198f,   0.013823725f,
+  0.014722068f,   0.015678791f,   0.016697687f,   0.017782797f,
+  0.018938423f,   0.020169149f,   0.021479854f,   0.022875735f,
+  0.024362330f,   0.025945531f,   0.027631618f,   0.029427276f,
+  0.031339626f,   0.033376252f,   0.035545228f,   0.037855157f,
+  0.040315199f,   0.042935108f,   0.045725273f,   0.048696758f,
+  0.051861348f,   0.055231591f,   0.058820850f,   0.062643361f,
+  0.066714279f,   0.071049749f,   0.075666962f,   0.080584227f,
+  0.085821044f,   0.091398179f,   0.097337747f,   0.10366330f,
+  0.11039993f,    0.11757434f,    0.12521498f,    0.13335215f,
+  0.14201813f,    0.15124727f,    0.16107617f,    0.17154380f,
+  0.18269168f,    0.19456402f,    0.20720788f,    0.22067342f,
+  0.23501402f,    0.25028656f,    0.26655159f,    0.28387361f,
+  0.30232132f,    0.32196786f,    0.34289114f,    0.36517414f,
+  0.38890521f,    0.41417847f,    0.44109412f,    0.46975890f,
+  0.50028648f,    0.53279791f,    0.56742212f,    0.60429640f,
+  0.64356699f,    0.68538959f,    0.72993007f,    0.77736504f,
+  0.82788260f,    0.88168307f,    0.9389798f,     1.0f
+};
+
+
+// @OPTIMIZE: if you want to replace this bresenham line-drawing routine,
+// note that you must produce bit-identical output to decode correctly;
+// this specific sequence of operations is specified in the spec (it's
+// drawing integer-quantized frequency-space lines that the encoder
+// expects to be exactly the same)
+//     ... also, isn't the whole point of Bresenham's algorithm to NOT
+// have to divide in the setup? sigh.
+#ifndef STB_VORBIS_NO_DEFER_FLOOR
+#define LINE_OP(a,b)   a *= b
+#else
+#define LINE_OP(a,b)   a = b
+#endif
+
+#ifdef STB_VORBIS_DIVIDE_TABLE
+#define DIVTAB_NUMER   32
+#define DIVTAB_DENOM   64
+int8 integer_divide_table[DIVTAB_NUMER][DIVTAB_DENOM]; // 2KB
+#endif
+
+static __forceinline void draw_line(float *output, int x0, int y0, int x1, int y1, int n)
+{
+   int dy = y1 - y0;
+   int adx = x1 - x0;
+   int ady = abs(dy);
+   int base;
+   int x=x0,y=y0;
+   int err = 0;
+   int sy;
+
+#ifdef STB_VORBIS_DIVIDE_TABLE
+   if (adx < DIVTAB_DENOM && ady < DIVTAB_NUMER) {
+      if (dy < 0) {
+         base = -integer_divide_table[ady][adx];
+         sy = base-1;
+      } else {
+         base =  integer_divide_table[ady][adx];
+         sy = base+1;
+      }
+   } else {
+      base = dy / adx;
+      if (dy < 0)
+         sy = base - 1;
+      else
+         sy = base+1;
+   }
+#else
+   base = dy / adx;
+   if (dy < 0)
+      sy = base - 1;
+   else
+      sy = base+1;
+#endif
+   ady -= abs(base) * adx;
+   if (x1 > n) x1 = n;
+   if (x < x1) {
+      LINE_OP(output[x], inverse_db_table[y&255]);
+      for (++x; x < x1; ++x) {
+         err += ady;
+         if (err >= adx) {
+            err -= adx;
+            y += sy;
+         } else
+            y += base;
+         LINE_OP(output[x], inverse_db_table[y&255]);
+      }
+   }
+}
+
+static int residue_decode(vorb *f, Codebook *book, float *target, int offset, int n, int rtype)
+{
+   int k;
+   if (rtype == 0) {
+      int step = n / book->dimensions;
+      for (k=0; k < step; ++k)
+         if (!codebook_decode_step(f, book, target+offset+k, n-offset-k, step))
+            return FALSE;
+   } else {
+      for (k=0; k < n; ) {
+         if (!codebook_decode(f, book, target+offset, n-k))
+            return FALSE;
+         k += book->dimensions;
+         offset += book->dimensions;
+      }
+   }
+   return TRUE;
+}
+
+// n is 1/2 of the blocksize --
+// specification: "Correct per-vector decode length is [n]/2"
+static void decode_residue(vorb *f, float *residue_buffers[], int ch, int n, int rn, uint8 *do_not_decode)
+{
+   int i,j,pass;
+   Residue *r = f->residue_config + rn;
+   int rtype = f->residue_types[rn];
+   int c = r->classbook;
+   int classwords = f->codebooks[c].dimensions;
+   unsigned int actual_size = rtype == 2 ? n*2 : n;
+   unsigned int limit_r_begin = (r->begin < actual_size ? r->begin : actual_size);
+   unsigned int limit_r_end   = (r->end   < actual_size ? r->end   : actual_size);
+   int n_read = limit_r_end - limit_r_begin;
+   int part_read = n_read / r->part_size;
+   int temp_alloc_point = temp_alloc_save(f);
+   #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+   uint8 ***part_classdata = (uint8 ***) temp_block_array(f,f->channels, part_read * sizeof(**part_classdata));
+   #else
+   int **classifications = (int **) temp_block_array(f,f->channels, part_read * sizeof(**classifications));
+   #endif
+
+   CHECK(f);
+
+   for (i=0; i < ch; ++i)
+      if (!do_not_decode[i])
+         memset(residue_buffers[i], 0, sizeof(float) * n);
+
+   if (rtype == 2 && ch != 1) {
+      for (j=0; j < ch; ++j)
+         if (!do_not_decode[j])
+            break;
+      if (j == ch)
+         goto done;
+
+      for (pass=0; pass < 8; ++pass) {
+         int pcount = 0, class_set = 0;
+         if (ch == 2) {
+            while (pcount < part_read) {
+               int z = r->begin + pcount*r->part_size;
+               int c_inter = (z & 1), p_inter = z>>1;
+               if (pass == 0) {
+                  Codebook *c = f->codebooks+r->classbook;
+                  int q;
+                  DECODE(q,f,c);
+                  if (q == EOP) goto done;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  part_classdata[0][class_set] = r->classdata[q];
+                  #else
+                  for (i=classwords-1; i >= 0; --i) {
+                     classifications[0][i+pcount] = q % r->classifications;
+                     q /= r->classifications;
+                  }
+                  #endif
+               }
+               for (i=0; i < classwords && pcount < part_read; ++i, ++pcount) {
+                  int z = r->begin + pcount*r->part_size;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  int c = part_classdata[0][class_set][i];
+                  #else
+                  int c = classifications[0][pcount];
+                  #endif
+                  int b = r->residue_books[c][pass];
+                  if (b >= 0) {
+                     Codebook *book = f->codebooks + b;
+                     #ifdef STB_VORBIS_DIVIDES_IN_CODEBOOK
+                     if (!codebook_decode_deinterleave_repeat(f, book, residue_buffers, ch, &c_inter, &p_inter, n, r->part_size))
+                        goto done;
+                     #else
+                     // saves 1%
+                     if (!codebook_decode_deinterleave_repeat(f, book, residue_buffers, ch, &c_inter, &p_inter, n, r->part_size))
+                        goto done;
+                     #endif
+                  } else {
+                     z += r->part_size;
+                     c_inter = z & 1;
+                     p_inter = z >> 1;
+                  }
+               }
+               #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+               ++class_set;
+               #endif
+            }
+         } else if (ch > 2) {
+            while (pcount < part_read) {
+               int z = r->begin + pcount*r->part_size;
+               int c_inter = z % ch, p_inter = z/ch;
+               if (pass == 0) {
+                  Codebook *c = f->codebooks+r->classbook;
+                  int q;
+                  DECODE(q,f,c);
+                  if (q == EOP) goto done;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  part_classdata[0][class_set] = r->classdata[q];
+                  #else
+                  for (i=classwords-1; i >= 0; --i) {
+                     classifications[0][i+pcount] = q % r->classifications;
+                     q /= r->classifications;
+                  }
+                  #endif
+               }
+               for (i=0; i < classwords && pcount < part_read; ++i, ++pcount) {
+                  int z = r->begin + pcount*r->part_size;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  int c = part_classdata[0][class_set][i];
+                  #else
+                  int c = classifications[0][pcount];
+                  #endif
+                  int b = r->residue_books[c][pass];
+                  if (b >= 0) {
+                     Codebook *book = f->codebooks + b;
+                     if (!codebook_decode_deinterleave_repeat(f, book, residue_buffers, ch, &c_inter, &p_inter, n, r->part_size))
+                        goto done;
+                  } else {
+                     z += r->part_size;
+                     c_inter = z % ch;
+                     p_inter = z / ch;
+                  }
+               }
+               #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+               ++class_set;
+               #endif
+            }
+         }
+      }
+      goto done;
+   }
+   CHECK(f);
+
+   for (pass=0; pass < 8; ++pass) {
+      int pcount = 0, class_set=0;
+      while (pcount < part_read) {
+         if (pass == 0) {
+            for (j=0; j < ch; ++j) {
+               if (!do_not_decode[j]) {
+                  Codebook *c = f->codebooks+r->classbook;
+                  int temp;
+                  DECODE(temp,f,c);
+                  if (temp == EOP) goto done;
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  part_classdata[j][class_set] = r->classdata[temp];
+                  #else
+                  for (i=classwords-1; i >= 0; --i) {
+                     classifications[j][i+pcount] = temp % r->classifications;
+                     temp /= r->classifications;
+                  }
+                  #endif
+               }
+            }
+         }
+         for (i=0; i < classwords && pcount < part_read; ++i, ++pcount) {
+            for (j=0; j < ch; ++j) {
+               if (!do_not_decode[j]) {
+                  #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+                  int c = part_classdata[j][class_set][i];
+                  #else
+                  int c = classifications[j][pcount];
+                  #endif
+                  int b = r->residue_books[c][pass];
+                  if (b >= 0) {
+                     float *target = residue_buffers[j];
+                     int offset = r->begin + pcount * r->part_size;
+                     int n = r->part_size;
+                     Codebook *book = f->codebooks + b;
+                     if (!residue_decode(f, book, target, offset, n, rtype))
+                        goto done;
+                  }
+               }
+            }
+         }
+         #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+         ++class_set;
+         #endif
+      }
+   }
+  done:
+   CHECK(f);
+   #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+   temp_free(f,part_classdata);
+   #else
+   temp_free(f,classifications);
+   #endif
+   temp_alloc_restore(f,temp_alloc_point);
+}
+
+
+#if 0
+// slow way for debugging
+void inverse_mdct_slow(float *buffer, int n)
+{
+   int i,j;
+   int n2 = n >> 1;
+   float *x = (float *) malloc(sizeof(*x) * n2);
+   memcpy(x, buffer, sizeof(*x) * n2);
+   for (i=0; i < n; ++i) {
+      float acc = 0;
+      for (j=0; j < n2; ++j)
+         // formula from paper:
+         //acc += n/4.0f * x[j] * (float) cos(M_PI / 2 / n * (2 * i + 1 + n/2.0)*(2*j+1));
+         // formula from wikipedia
+         //acc += 2.0f / n2 * x[j] * (float) cos(M_PI/n2 * (i + 0.5 + n2/2)*(j + 0.5));
+         // these are equivalent, except the formula from the paper inverts the multiplier!
+         // however, what actually works is NO MULTIPLIER!?!
+         //acc += 64 * 2.0f / n2 * x[j] * (float) cos(M_PI/n2 * (i + 0.5 + n2/2)*(j + 0.5));
+         acc += x[j] * (float) cos(M_PI / 2 / n * (2 * i + 1 + n/2.0)*(2*j+1));
+      buffer[i] = acc;
+   }
+   free(x);
+}
+#elif 0
+// same as above, but just barely able to run in real time on modern machines
+void inverse_mdct_slow(float *buffer, int n, vorb *f, int blocktype)
+{
+   float mcos[16384];
+   int i,j;
+   int n2 = n >> 1, nmask = (n << 2) -1;
+   float *x = (float *) malloc(sizeof(*x) * n2);
+   memcpy(x, buffer, sizeof(*x) * n2);
+   for (i=0; i < 4*n; ++i)
+      mcos[i] = (float) cos(M_PI / 2 * i / n);
+
+   for (i=0; i < n; ++i) {
+      float acc = 0;
+      for (j=0; j < n2; ++j)
+         acc += x[j] * mcos[(2 * i + 1 + n2)*(2*j+1) & nmask];
+      buffer[i] = acc;
+   }
+   free(x);
+}
+#elif 0
+// transform to use a slow dct-iv; this is STILL basically trivial,
+// but only requires half as many ops
+void dct_iv_slow(float *buffer, int n)
+{
+   float mcos[16384];
+   float x[2048];
+   int i,j;
+   int n2 = n >> 1, nmask = (n << 3) - 1;
+   memcpy(x, buffer, sizeof(*x) * n);
+   for (i=0; i < 8*n; ++i)
+      mcos[i] = (float) cos(M_PI / 4 * i / n);
+   for (i=0; i < n; ++i) {
+      float acc = 0;
+      for (j=0; j < n; ++j)
+         acc += x[j] * mcos[((2 * i + 1)*(2*j+1)) & nmask];
+      buffer[i] = acc;
+   }
+}
+
+void inverse_mdct_slow(float *buffer, int n, vorb *f, int blocktype)
+{
+   int i, n4 = n >> 2, n2 = n >> 1, n3_4 = n - n4;
+   float temp[4096];
+
+   memcpy(temp, buffer, n2 * sizeof(float));
+   dct_iv_slow(temp, n2);  // returns -c'-d, a-b'
+
+   for (i=0; i < n4  ; ++i) buffer[i] = temp[i+n4];            // a-b'
+   for (   ; i < n3_4; ++i) buffer[i] = -temp[n3_4 - i - 1];   // b-a', c+d'
+   for (   ; i < n   ; ++i) buffer[i] = -temp[i - n3_4];       // c'+d
+}
+#endif
+
+#ifndef LIBVORBIS_MDCT
+#define LIBVORBIS_MDCT 0
+#endif
+
+#if LIBVORBIS_MDCT
+// directly call the vorbis MDCT using an interface documented
+// by Jeff Roberts... useful for performance comparison
+typedef struct
+{
+  int n;
+  int log2n;
+
+  float *trig;
+  int   *bitrev;
+
+  float scale;
+} mdct_lookup;
+
+extern void mdct_init(mdct_lookup *lookup, int n);
+extern void mdct_clear(mdct_lookup *l);
+extern void mdct_backward(mdct_lookup *init, float *in, float *out);
+
+mdct_lookup M1,M2;
+
+void inverse_mdct(float *buffer, int n, vorb *f, int blocktype)
+{
+   mdct_lookup *M;
+   if (M1.n == n) M = &M1;
+   else if (M2.n == n) M = &M2;
+   else if (M1.n == 0) { mdct_init(&M1, n); M = &M1; }
+   else {
+      if (M2.n) __asm int 3;
+      mdct_init(&M2, n);
+      M = &M2;
+   }
+
+   mdct_backward(M, buffer, buffer);
+}
+#endif
+
+
+// the following were split out into separate functions while optimizing;
+// they could be pushed back up but eh. __forceinline showed no change;
+// they're probably already being inlined.
+static void imdct_step3_iter0_loop(int n, float *e, int i_off, int k_off, float *A)
+{
+   float *ee0 = e + i_off;
+   float *ee2 = ee0 + k_off;
+   int i;
+
+   assert((n & 3) == 0);
+   for (i=(n>>2); i > 0; --i) {
+      float k00_20, k01_21;
+      k00_20  = ee0[ 0] - ee2[ 0];
+      k01_21  = ee0[-1] - ee2[-1];
+      ee0[ 0] += ee2[ 0];//ee0[ 0] = ee0[ 0] + ee2[ 0];
+      ee0[-1] += ee2[-1];//ee0[-1] = ee0[-1] + ee2[-1];
+      ee2[ 0] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-1] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+
+      k00_20  = ee0[-2] - ee2[-2];
+      k01_21  = ee0[-3] - ee2[-3];
+      ee0[-2] += ee2[-2];//ee0[-2] = ee0[-2] + ee2[-2];
+      ee0[-3] += ee2[-3];//ee0[-3] = ee0[-3] + ee2[-3];
+      ee2[-2] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-3] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+
+      k00_20  = ee0[-4] - ee2[-4];
+      k01_21  = ee0[-5] - ee2[-5];
+      ee0[-4] += ee2[-4];//ee0[-4] = ee0[-4] + ee2[-4];
+      ee0[-5] += ee2[-5];//ee0[-5] = ee0[-5] + ee2[-5];
+      ee2[-4] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-5] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+
+      k00_20  = ee0[-6] - ee2[-6];
+      k01_21  = ee0[-7] - ee2[-7];
+      ee0[-6] += ee2[-6];//ee0[-6] = ee0[-6] + ee2[-6];
+      ee0[-7] += ee2[-7];//ee0[-7] = ee0[-7] + ee2[-7];
+      ee2[-6] = k00_20 * A[0] - k01_21 * A[1];
+      ee2[-7] = k01_21 * A[0] + k00_20 * A[1];
+      A += 8;
+      ee0 -= 8;
+      ee2 -= 8;
+   }
+}
+
+static void imdct_step3_inner_r_loop(int lim, float *e, int d0, int k_off, float *A, int k1)
+{
+   int i;
+   float k00_20, k01_21;
+
+   float *e0 = e + d0;
+   float *e2 = e0 + k_off;
+
+   for (i=lim >> 2; i > 0; --i) {
+      k00_20 = e0[-0] - e2[-0];
+      k01_21 = e0[-1] - e2[-1];
+      e0[-0] += e2[-0];//e0[-0] = e0[-0] + e2[-0];
+      e0[-1] += e2[-1];//e0[-1] = e0[-1] + e2[-1];
+      e2[-0] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-1] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      A += k1;
+
+      k00_20 = e0[-2] - e2[-2];
+      k01_21 = e0[-3] - e2[-3];
+      e0[-2] += e2[-2];//e0[-2] = e0[-2] + e2[-2];
+      e0[-3] += e2[-3];//e0[-3] = e0[-3] + e2[-3];
+      e2[-2] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-3] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      A += k1;
+
+      k00_20 = e0[-4] - e2[-4];
+      k01_21 = e0[-5] - e2[-5];
+      e0[-4] += e2[-4];//e0[-4] = e0[-4] + e2[-4];
+      e0[-5] += e2[-5];//e0[-5] = e0[-5] + e2[-5];
+      e2[-4] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-5] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      A += k1;
+
+      k00_20 = e0[-6] - e2[-6];
+      k01_21 = e0[-7] - e2[-7];
+      e0[-6] += e2[-6];//e0[-6] = e0[-6] + e2[-6];
+      e0[-7] += e2[-7];//e0[-7] = e0[-7] + e2[-7];
+      e2[-6] = (k00_20)*A[0] - (k01_21) * A[1];
+      e2[-7] = (k01_21)*A[0] + (k00_20) * A[1];
+
+      e0 -= 8;
+      e2 -= 8;
+
+      A += k1;
+   }
+}
+
+static void imdct_step3_inner_s_loop(int n, float *e, int i_off, int k_off, float *A, int a_off, int k0)
+{
+   int i;
+   float A0 = A[0];
+   float A1 = A[0+1];
+   float A2 = A[0+a_off];
+   float A3 = A[0+a_off+1];
+   float A4 = A[0+a_off*2+0];
+   float A5 = A[0+a_off*2+1];
+   float A6 = A[0+a_off*3+0];
+   float A7 = A[0+a_off*3+1];
+
+   float k00,k11;
+
+   float *ee0 = e  +i_off;
+   float *ee2 = ee0+k_off;
+
+   for (i=n; i > 0; --i) {
+      k00     = ee0[ 0] - ee2[ 0];
+      k11     = ee0[-1] - ee2[-1];
+      ee0[ 0] =  ee0[ 0] + ee2[ 0];
+      ee0[-1] =  ee0[-1] + ee2[-1];
+      ee2[ 0] = (k00) * A0 - (k11) * A1;
+      ee2[-1] = (k11) * A0 + (k00) * A1;
+
+      k00     = ee0[-2] - ee2[-2];
+      k11     = ee0[-3] - ee2[-3];
+      ee0[-2] =  ee0[-2] + ee2[-2];
+      ee0[-3] =  ee0[-3] + ee2[-3];
+      ee2[-2] = (k00) * A2 - (k11) * A3;
+      ee2[-3] = (k11) * A2 + (k00) * A3;
+
+      k00     = ee0[-4] - ee2[-4];
+      k11     = ee0[-5] - ee2[-5];
+      ee0[-4] =  ee0[-4] + ee2[-4];
+      ee0[-5] =  ee0[-5] + ee2[-5];
+      ee2[-4] = (k00) * A4 - (k11) * A5;
+      ee2[-5] = (k11) * A4 + (k00) * A5;
+
+      k00     = ee0[-6] - ee2[-6];
+      k11     = ee0[-7] - ee2[-7];
+      ee0[-6] =  ee0[-6] + ee2[-6];
+      ee0[-7] =  ee0[-7] + ee2[-7];
+      ee2[-6] = (k00) * A6 - (k11) * A7;
+      ee2[-7] = (k11) * A6 + (k00) * A7;
+
+      ee0 -= k0;
+      ee2 -= k0;
+   }
+}
+
+static __forceinline void iter_54(float *z)
+{
+   float k00,k11,k22,k33;
+   float y0,y1,y2,y3;
+
+   k00  = z[ 0] - z[-4];
+   y0   = z[ 0] + z[-4];
+   y2   = z[-2] + z[-6];
+   k22  = z[-2] - z[-6];
+
+   z[-0] = y0 + y2;      // z0 + z4 + z2 + z6
+   z[-2] = y0 - y2;      // z0 + z4 - z2 - z6
+
+   // done with y0,y2
+
+   k33  = z[-3] - z[-7];
+
+   z[-4] = k00 + k33;    // z0 - z4 + z3 - z7
+   z[-6] = k00 - k33;    // z0 - z4 - z3 + z7
+
+   // done with k33
+
+   k11  = z[-1] - z[-5];
+   y1   = z[-1] + z[-5];
+   y3   = z[-3] + z[-7];
+
+   z[-1] = y1 + y3;      // z1 + z5 + z3 + z7
+   z[-3] = y1 - y3;      // z1 + z5 - z3 - z7
+   z[-5] = k11 - k22;    // z1 - z5 + z2 - z6
+   z[-7] = k11 + k22;    // z1 - z5 - z2 + z6
+}
+
+static void imdct_step3_inner_s_loop_ld654(int n, float *e, int i_off, float *A, int base_n)
+{
+   int a_off = base_n >> 3;
+   float A2 = A[0+a_off];
+   float *z = e + i_off;
+   float *base = z - 16 * n;
+
+   while (z > base) {
+      float k00,k11;
+      float l00,l11;
+
+      k00    = z[-0] - z[ -8];
+      k11    = z[-1] - z[ -9];
+      l00    = z[-2] - z[-10];
+      l11    = z[-3] - z[-11];
+      z[ -0] = z[-0] + z[ -8];
+      z[ -1] = z[-1] + z[ -9];
+      z[ -2] = z[-2] + z[-10];
+      z[ -3] = z[-3] + z[-11];
+      z[ -8] = k00;
+      z[ -9] = k11;
+      z[-10] = (l00+l11) * A2;
+      z[-11] = (l11-l00) * A2;
+
+      k00    = z[ -4] - z[-12];
+      k11    = z[ -5] - z[-13];
+      l00    = z[ -6] - z[-14];
+      l11    = z[ -7] - z[-15];
+      z[ -4] = z[ -4] + z[-12];
+      z[ -5] = z[ -5] + z[-13];
+      z[ -6] = z[ -6] + z[-14];
+      z[ -7] = z[ -7] + z[-15];
+      z[-12] = k11;
+      z[-13] = -k00;
+      z[-14] = (l11-l00) * A2;
+      z[-15] = (l00+l11) * -A2;
+
+      iter_54(z);
+      iter_54(z-8);
+      z -= 16;
+   }
+}
+
+static void inverse_mdct(float *buffer, int n, vorb *f, int blocktype)
+{
+   int n2 = n >> 1, n4 = n >> 2, n8 = n >> 3, l;
+   int ld;
+   // @OPTIMIZE: reduce register pressure by using fewer variables?
+   int save_point = temp_alloc_save(f);
+   float *buf2 = (float *) temp_alloc(f, n2 * sizeof(*buf2));
+   float *u=NULL,*v=NULL;
+   // twiddle factors
+   float *A = f->A[blocktype];
+
+   // IMDCT algorithm from "The use of multirate filter banks for coding of high quality digital audio"
+   // See notes about bugs in that paper in less-optimal implementation 'inverse_mdct_old' after this function.
+
+   // kernel from paper
+
+
+   // merged:
+   //   copy and reflect spectral data
+   //   step 0
+
+   // note that it turns out that the items added together during
+   // this step are, in fact, being added to themselves (as reflected
+   // by step 0). inexplicable inefficiency! this became obvious
+   // once I combined the passes.
+
+   // so there's a missing 'times 2' here (for adding X to itself).
+   // this propagates through linearly to the end, where the numbers
+   // are 1/2 too small, and need to be compensated for.
+
+   {
+      float *d,*e, *AA, *e_stop;
+      d = &buf2[n2-2];
+      AA = A;
+      e = &buffer[0];
+      e_stop = &buffer[n2];
+      while (e != e_stop) {
+         d[1] = (e[0] * AA[0] - e[2]*AA[1]);
+         d[0] = (e[0] * AA[1] + e[2]*AA[0]);
+         d -= 2;
+         AA += 2;
+         e += 4;
+      }
+
+      e = &buffer[n2-3];
+      while (d >= buf2) {
+         d[1] = (-e[2] * AA[0] - -e[0]*AA[1]);
+         d[0] = (-e[2] * AA[1] + -e[0]*AA[0]);
+         d -= 2;
+         AA += 2;
+         e -= 4;
+      }
+   }
+
+   // now we use symbolic names for these, so that we can
+   // possibly swap their meaning as we change which operations
+   // are in place
+
+   u = buffer;
+   v = buf2;
+
+   // step 2    (paper output is w, now u)
+   // this could be in place, but the data ends up in the wrong
+   // place... _somebody_'s got to swap it, so this is nominated
+   {
+      float *AA = &A[n2-8];
+      float *d0,*d1, *e0, *e1;
+
+      e0 = &v[n4];
+      e1 = &v[0];
+
+      d0 = &u[n4];
+      d1 = &u[0];
+
+      while (AA >= A) {
+         float v40_20, v41_21;
+
+         v41_21 = e0[1] - e1[1];
+         v40_20 = e0[0] - e1[0];
+         d0[1]  = e0[1] + e1[1];
+         d0[0]  = e0[0] + e1[0];
+         d1[1]  = v41_21*AA[4] - v40_20*AA[5];
+         d1[0]  = v40_20*AA[4] + v41_21*AA[5];
+
+         v41_21 = e0[3] - e1[3];
+         v40_20 = e0[2] - e1[2];
+         d0[3]  = e0[3] + e1[3];
+         d0[2]  = e0[2] + e1[2];
+         d1[3]  = v41_21*AA[0] - v40_20*AA[1];
+         d1[2]  = v40_20*AA[0] + v41_21*AA[1];
+
+         AA -= 8;
+
+         d0 += 4;
+         d1 += 4;
+         e0 += 4;
+         e1 += 4;
+      }
+   }
+
+   // step 3
+   ld = ilog(n) - 1; // ilog is off-by-one from normal definitions
+
+   // optimized step 3:
+
+   // the original step3 loop can be nested r inside s or s inside r;
+   // it's written originally as s inside r, but this is dumb when r
+   // iterates many times, and s few. So I have two copies of it and
+   // switch between them halfway.
+
+   // this is iteration 0 of step 3
+   imdct_step3_iter0_loop(n >> 4, u, n2-1-n4*0, -(n >> 3), A);
+   imdct_step3_iter0_loop(n >> 4, u, n2-1-n4*1, -(n >> 3), A);
+
+   // this is iteration 1 of step 3
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*0, -(n >> 4), A, 16);
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*1, -(n >> 4), A, 16);
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*2, -(n >> 4), A, 16);
+   imdct_step3_inner_r_loop(n >> 5, u, n2-1 - n8*3, -(n >> 4), A, 16);
+
+   l=2;
+   for (; l < (ld-3)>>1; ++l) {
+      int k0 = n >> (l+2), k0_2 = k0>>1;
+      int lim = 1 << (l+1);
+      int i;
+      for (i=0; i < lim; ++i)
+         imdct_step3_inner_r_loop(n >> (l+4), u, n2-1 - k0*i, -k0_2, A, 1 << (l+3));
+   }
+
+   for (; l < ld-6; ++l) {
+      int k0 = n >> (l+2), k1 = 1 << (l+3), k0_2 = k0>>1;
+      int rlim = n >> (l+6), r;
+      int lim = 1 << (l+1);
+      int i_off;
+      float *A0 = A;
+      i_off = n2-1;
+      for (r=rlim; r > 0; --r) {
+         imdct_step3_inner_s_loop(lim, u, i_off, -k0_2, A0, k1, k0);
+         A0 += k1*4;
+         i_off -= 8;
+      }
+   }
+
+   // iterations with count:
+   //   ld-6,-5,-4 all interleaved together
+   //       the big win comes from getting rid of needless flops
+   //         due to the constants on pass 5 & 4 being all 1 and 0;
+   //       combining them to be simultaneous to improve cache made little difference
+   imdct_step3_inner_s_loop_ld654(n >> 5, u, n2-1, A, n);
+
+   // output is u
+
+   // step 4, 5, and 6
+   // cannot be in-place because of step 5
+   {
+      uint16 *bitrev = f->bit_reverse[blocktype];
+      // weirdly, I'd have thought reading sequentially and writing
+      // erratically would have been better than vice-versa, but in
+      // fact that's not what my testing showed. (That is, with
+      // j = bitreverse(i), do you read i and write j, or read j and write i.)
+
+      float *d0 = &v[n4-4];
+      float *d1 = &v[n2-4];
+      while (d0 >= v) {
+         int k4;
+
+         k4 = bitrev[0];
+         d1[3] = u[k4+0];
+         d1[2] = u[k4+1];
+         d0[3] = u[k4+2];
+         d0[2] = u[k4+3];
+
+         k4 = bitrev[1];
+         d1[1] = u[k4+0];
+         d1[0] = u[k4+1];
+         d0[1] = u[k4+2];
+         d0[0] = u[k4+3];
+
+         d0 -= 4;
+         d1 -= 4;
+         bitrev += 2;
+      }
+   }
+   // (paper output is u, now v)
+
+
+   // data must be in buf2
+   assert(v == buf2);
+
+   // step 7   (paper output is v, now v)
+   // this is now in place
+   {
+      float *C = f->C[blocktype];
+      float *d, *e;
+
+      d = v;
+      e = v + n2 - 4;
+
+      while (d < e) {
+         float a02,a11,b0,b1,b2,b3;
+
+         a02 = d[0] - e[2];
+         a11 = d[1] + e[3];
+
+         b0 = C[1]*a02 + C[0]*a11;
+         b1 = C[1]*a11 - C[0]*a02;
+
+         b2 = d[0] + e[ 2];
+         b3 = d[1] - e[ 3];
+
+         d[0] = b2 + b0;
+         d[1] = b3 + b1;
+         e[2] = b2 - b0;
+         e[3] = b1 - b3;
+
+         a02 = d[2] - e[0];
+         a11 = d[3] + e[1];
+
+         b0 = C[3]*a02 + C[2]*a11;
+         b1 = C[3]*a11 - C[2]*a02;
+
+         b2 = d[2] + e[ 0];
+         b3 = d[3] - e[ 1];
+
+         d[2] = b2 + b0;
+         d[3] = b3 + b1;
+         e[0] = b2 - b0;
+         e[1] = b1 - b3;
+
+         C += 4;
+         d += 4;
+         e -= 4;
+      }
+   }
+
+   // data must be in buf2
+
+
+   // step 8+decode   (paper output is X, now buffer)
+   // this generates pairs of data a la 8 and pushes them directly through
+   // the decode kernel (pushing rather than pulling) to avoid having
+   // to make another pass later
+
+   // this cannot POSSIBLY be in place, so we refer to the buffers directly
+
+   {
+      float *d0,*d1,*d2,*d3;
+
+      float *B = f->B[blocktype] + n2 - 8;
+      float *e = buf2 + n2 - 8;
+      d0 = &buffer[0];
+      d1 = &buffer[n2-4];
+      d2 = &buffer[n2];
+      d3 = &buffer[n-4];
+      while (e >= v) {
+         float p0,p1,p2,p3;
+
+         p3 =  e[6]*B[7] - e[7]*B[6];
+         p2 = -e[6]*B[6] - e[7]*B[7];
+
+         d0[0] =   p3;
+         d1[3] = - p3;
+         d2[0] =   p2;
+         d3[3] =   p2;
+
+         p1 =  e[4]*B[5] - e[5]*B[4];
+         p0 = -e[4]*B[4] - e[5]*B[5];
+
+         d0[1] =   p1;
+         d1[2] = - p1;
+         d2[1] =   p0;
+         d3[2] =   p0;
+
+         p3 =  e[2]*B[3] - e[3]*B[2];
+         p2 = -e[2]*B[2] - e[3]*B[3];
+
+         d0[2] =   p3;
+         d1[1] = - p3;
+         d2[2] =   p2;
+         d3[1] =   p2;
+
+         p1 =  e[0]*B[1] - e[1]*B[0];
+         p0 = -e[0]*B[0] - e[1]*B[1];
+
+         d0[3] =   p1;
+         d1[0] = - p1;
+         d2[3] =   p0;
+         d3[0] =   p0;
+
+         B -= 8;
+         e -= 8;
+         d0 += 4;
+         d2 += 4;
+         d1 -= 4;
+         d3 -= 4;
+      }
+   }
+
+   temp_free(f,buf2);
+   temp_alloc_restore(f,save_point);
+}
+
+#if 0
+// this is the original version of the above code, if you want to optimize it from scratch
+void inverse_mdct_naive(float *buffer, int n)
+{
+   float s;
+   float A[1 << 12], B[1 << 12], C[1 << 11];
+   int i,k,k2,k4, n2 = n >> 1, n4 = n >> 2, n8 = n >> 3, l;
+   int n3_4 = n - n4, ld;
+   // how can they claim this only uses N words?!
+   // oh, because they're only used sparsely, whoops
+   float u[1 << 13], X[1 << 13], v[1 << 13], w[1 << 13];
+   // set up twiddle factors
+
+   for (k=k2=0; k < n4; ++k,k2+=2) {
+      A[k2  ] = (float)  cos(4*k*M_PI/n);
+      A[k2+1] = (float) -sin(4*k*M_PI/n);
+      B[k2  ] = (float)  cos((k2+1)*M_PI/n/2);
+      B[k2+1] = (float)  sin((k2+1)*M_PI/n/2);
+   }
+   for (k=k2=0; k < n8; ++k,k2+=2) {
+      C[k2  ] = (float)  cos(2*(k2+1)*M_PI/n);
+      C[k2+1] = (float) -sin(2*(k2+1)*M_PI/n);
+   }
+
+   // IMDCT algorithm from "The use of multirate filter banks for coding of high quality digital audio"
+   // Note there are bugs in that pseudocode, presumably due to them attempting
+   // to rename the arrays nicely rather than representing the way their actual
+   // implementation bounces buffers back and forth. As a result, even in the
+   // "some formulars corrected" version, a direct implementation fails. These
+   // are noted below as "paper bug".
+
+   // copy and reflect spectral data
+   for (k=0; k < n2; ++k) u[k] = buffer[k];
+   for (   ; k < n ; ++k) u[k] = -buffer[n - k - 1];
+   // kernel from paper
+   // step 1
+   for (k=k2=k4=0; k < n4; k+=1, k2+=2, k4+=4) {
+      v[n-k4-1] = (u[k4] - u[n-k4-1]) * A[k2]   - (u[k4+2] - u[n-k4-3])*A[k2+1];
+      v[n-k4-3] = (u[k4] - u[n-k4-1]) * A[k2+1] + (u[k4+2] - u[n-k4-3])*A[k2];
+   }
+   // step 2
+   for (k=k4=0; k < n8; k+=1, k4+=4) {
+      w[n2+3+k4] = v[n2+3+k4] + v[k4+3];
+      w[n2+1+k4] = v[n2+1+k4] + v[k4+1];
+      w[k4+3]    = (v[n2+3+k4] - v[k4+3])*A[n2-4-k4] - (v[n2+1+k4]-v[k4+1])*A[n2-3-k4];
+      w[k4+1]    = (v[n2+1+k4] - v[k4+1])*A[n2-4-k4] + (v[n2+3+k4]-v[k4+3])*A[n2-3-k4];
+   }
+   // step 3
+   ld = ilog(n) - 1; // ilog is off-by-one from normal definitions
+   for (l=0; l < ld-3; ++l) {
+      int k0 = n >> (l+2), k1 = 1 << (l+3);
+      int rlim = n >> (l+4), r4, r;
+      int s2lim = 1 << (l+2), s2;
+      for (r=r4=0; r < rlim; r4+=4,++r) {
+         for (s2=0; s2 < s2lim; s2+=2) {
+            u[n-1-k0*s2-r4] = w[n-1-k0*s2-r4] + w[n-1-k0*(s2+1)-r4];
+            u[n-3-k0*s2-r4] = w[n-3-k0*s2-r4] + w[n-3-k0*(s2+1)-r4];
+            u[n-1-k0*(s2+1)-r4] = (w[n-1-k0*s2-r4] - w[n-1-k0*(s2+1)-r4]) * A[r*k1]
+                                - (w[n-3-k0*s2-r4] - w[n-3-k0*(s2+1)-r4]) * A[r*k1+1];
+            u[n-3-k0*(s2+1)-r4] = (w[n-3-k0*s2-r4] - w[n-3-k0*(s2+1)-r4]) * A[r*k1]
+                                + (w[n-1-k0*s2-r4] - w[n-1-k0*(s2+1)-r4]) * A[r*k1+1];
+         }
+      }
+      if (l+1 < ld-3) {
+         // paper bug: ping-ponging of u&w here is omitted
+         memcpy(w, u, sizeof(u));
+      }
+   }
+
+   // step 4
+   for (i=0; i < n8; ++i) {
+      int j = bit_reverse(i) >> (32-ld+3);
+      assert(j < n8);
+      if (i == j) {
+         // paper bug: original code probably swapped in place; if copying,
+         //            need to directly copy in this case
+         int i8 = i << 3;
+         v[i8+1] = u[i8+1];
+         v[i8+3] = u[i8+3];
+         v[i8+5] = u[i8+5];
+         v[i8+7] = u[i8+7];
+      } else if (i < j) {
+         int i8 = i << 3, j8 = j << 3;
+         v[j8+1] = u[i8+1], v[i8+1] = u[j8 + 1];
+         v[j8+3] = u[i8+3], v[i8+3] = u[j8 + 3];
+         v[j8+5] = u[i8+5], v[i8+5] = u[j8 + 5];
+         v[j8+7] = u[i8+7], v[i8+7] = u[j8 + 7];
+      }
+   }
+   // step 5
+   for (k=0; k < n2; ++k) {
+      w[k] = v[k*2+1];
+   }
+   // step 6
+   for (k=k2=k4=0; k < n8; ++k, k2 += 2, k4 += 4) {
+      u[n-1-k2] = w[k4];
+      u[n-2-k2] = w[k4+1];
+      u[n3_4 - 1 - k2] = w[k4+2];
+      u[n3_4 - 2 - k2] = w[k4+3];
+   }
+   // step 7
+   for (k=k2=0; k < n8; ++k, k2 += 2) {
+      v[n2 + k2 ] = ( u[n2 + k2] + u[n-2-k2] + C[k2+1]*(u[n2+k2]-u[n-2-k2]) + C[k2]*(u[n2+k2+1]+u[n-2-k2+1]))/2;
+      v[n-2 - k2] = ( u[n2 + k2] + u[n-2-k2] - C[k2+1]*(u[n2+k2]-u[n-2-k2]) - C[k2]*(u[n2+k2+1]+u[n-2-k2+1]))/2;
+      v[n2+1+ k2] = ( u[n2+1+k2] - u[n-1-k2] + C[k2+1]*(u[n2+1+k2]+u[n-1-k2]) - C[k2]*(u[n2+k2]-u[n-2-k2]))/2;
+      v[n-1 - k2] = (-u[n2+1+k2] + u[n-1-k2] + C[k2+1]*(u[n2+1+k2]+u[n-1-k2]) - C[k2]*(u[n2+k2]-u[n-2-k2]))/2;
+   }
+   // step 8
+   for (k=k2=0; k < n4; ++k,k2 += 2) {
+      X[k]      = v[k2+n2]*B[k2  ] + v[k2+1+n2]*B[k2+1];
+      X[n2-1-k] = v[k2+n2]*B[k2+1] - v[k2+1+n2]*B[k2  ];
+   }
+
+   // decode kernel to output
+   // determined the following value experimentally
+   // (by first figuring out what made inverse_mdct_slow work); then matching that here
+   // (probably vorbis encoder premultiplies by n or n/2, to save it on the decoder?)
+   s = 0.5; // theoretically would be n4
+
+   // [[[ note! the s value of 0.5 is compensated for by the B[] in the current code,
+   //     so it needs to use the "old" B values to behave correctly, or else
+   //     set s to 1.0 ]]]
+   for (i=0; i < n4  ; ++i) buffer[i] = s * X[i+n4];
+   for (   ; i < n3_4; ++i) buffer[i] = -s * X[n3_4 - i - 1];
+   for (   ; i < n   ; ++i) buffer[i] = -s * X[i - n3_4];
+}
+#endif
+
+static float *get_window(vorb *f, int len)
+{
+   len <<= 1;
+   if (len == f->blocksize_0) return f->window[0];
+   if (len == f->blocksize_1) return f->window[1];
+   return NULL;
+}
+
+#ifndef STB_VORBIS_NO_DEFER_FLOOR
+typedef int16 YTYPE;
+#else
+typedef int YTYPE;
+#endif
+static int do_floor(vorb *f, Mapping *map, int i, int n, float *target, YTYPE *finalY, uint8 *step2_flag)
+{
+   int n2 = n >> 1;
+   int s = map->chan[i].mux, floor;
+   floor = map->submap_floor[s];
+   if (f->floor_types[floor] == 0) {
+      return error(f, VORBIS_invalid_stream);
+   } else {
+      Floor1 *g = &f->floor_config[floor].floor1;
+      int j,q;
+      int lx = 0, ly = finalY[0] * g->floor1_multiplier;
+      for (q=1; q < g->values; ++q) {
+         j = g->sorted_order[q];
+         #ifndef STB_VORBIS_NO_DEFER_FLOOR
+         STBV_NOTUSED(step2_flag);
+         if (finalY[j] >= 0)
+         #else
+         if (step2_flag[j])
+         #endif
+         {
+            int hy = finalY[j] * g->floor1_multiplier;
+            int hx = g->Xlist[j];
+            if (lx != hx)
+               draw_line(target, lx,ly, hx,hy, n2);
+            CHECK(f);
+            lx = hx, ly = hy;
+         }
+      }
+      if (lx < n2) {
+         // optimization of: draw_line(target, lx,ly, n,ly, n2);
+         for (j=lx; j < n2; ++j)
+            LINE_OP(target[j], inverse_db_table[ly]);
+         CHECK(f);
+      }
+   }
+   return TRUE;
+}
+
+// The meaning of "left" and "right"
+//
+// For a given frame:
+//     we compute samples from 0..n
+//     window_center is n/2
+//     we'll window and mix the samples from left_start to left_end with data from the previous frame
+//     all of the samples from left_end to right_start can be output without mixing; however,
+//        this interval is 0-length except when transitioning between short and long frames
+//     all of the samples from right_start to right_end need to be mixed with the next frame,
+//        which we don't have, so those get saved in a buffer
+//     frame N's right_end-right_start, the number of samples to mix with the next frame,
+//        has to be the same as frame N+1's left_end-left_start (which they are by
+//        construction)
+
+static int vorbis_decode_initial(vorb *f, int *p_left_start, int *p_left_end, int *p_right_start, int *p_right_end, int *mode)
+{
+   Mode *m;
+   int i, n, prev, next, window_center;
+   f->channel_buffer_start = f->channel_buffer_end = 0;
+
+  retry:
+   if (f->eof) return FALSE;
+   if (!maybe_start_packet(f))
+      return FALSE;
+   // check packet type
+   if (get_bits(f,1) != 0) {
+      if (IS_PUSH_MODE(f))
+         return error(f,VORBIS_bad_packet_type);
+      while (EOP != get8_packet(f));
+      goto retry;
+   }
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+
+   i = get_bits(f, ilog(f->mode_count-1));
+   if (i == EOP) return FALSE;
+   if (i >= f->mode_count) return FALSE;
+   *mode = i;
+   m = f->mode_config + i;
+   if (m->blockflag) {
+      n = f->blocksize_1;
+      prev = get_bits(f,1);
+      next = get_bits(f,1);
+   } else {
+      prev = next = 0;
+      n = f->blocksize_0;
+   }
+
+// WINDOWING
+
+   window_center = n >> 1;
+   if (m->blockflag && !prev) {
+      *p_left_start = (n - f->blocksize_0) >> 2;
+      *p_left_end   = (n + f->blocksize_0) >> 2;
+   } else {
+      *p_left_start = 0;
+      *p_left_end   = window_center;
+   }
+   if (m->blockflag && !next) {
+      *p_right_start = (n*3 - f->blocksize_0) >> 2;
+      *p_right_end   = (n*3 + f->blocksize_0) >> 2;
+   } else {
+      *p_right_start = window_center;
+      *p_right_end   = n;
+   }
+
+   return TRUE;
+}
+
+static int vorbis_decode_packet_rest(vorb *f, int *len, Mode *m, int left_start, int left_end, int right_start, int right_end, int *p_left)
+{
+   Mapping *map;
+   int i,j,k,n,n2;
+   int zero_channel[256];
+   int really_zero_channel[256];
+
+// WINDOWING
+
+   STBV_NOTUSED(left_end);
+   n = f->blocksize[m->blockflag];
+   map = &f->mapping[m->mapping];
+
+// FLOORS
+   n2 = n >> 1;
+
+   CHECK(f);
+
+   for (i=0; i < f->channels; ++i) {
+      int s = map->chan[i].mux, floor;
+      zero_channel[i] = FALSE;
+      floor = map->submap_floor[s];
+      if (f->floor_types[floor] == 0) {
+         return error(f, VORBIS_invalid_stream);
+      } else {
+         Floor1 *g = &f->floor_config[floor].floor1;
+         if (get_bits(f, 1)) {
+            short *finalY;
+            uint8 step2_flag[256];
+            static int range_list[4] = { 256, 128, 86, 64 };
+            int range = range_list[g->floor1_multiplier-1];
+            int offset = 2;
+            finalY = f->finalY[i];
+            finalY[0] = get_bits(f, ilog(range)-1);
+            finalY[1] = get_bits(f, ilog(range)-1);
+            for (j=0; j < g->partitions; ++j) {
+               int pclass = g->partition_class_list[j];
+               int cdim = g->class_dimensions[pclass];
+               int cbits = g->class_subclasses[pclass];
+               int csub = (1 << cbits)-1;
+               int cval = 0;
+               if (cbits) {
+                  Codebook *c = f->codebooks + g->class_masterbooks[pclass];
+                  DECODE(cval,f,c);
+               }
+               for (k=0; k < cdim; ++k) {
+                  int book = g->subclass_books[pclass][cval & csub];
+                  cval = cval >> cbits;
+                  if (book >= 0) {
+                     int temp;
+                     Codebook *c = f->codebooks + book;
+                     DECODE(temp,f,c);
+                     finalY[offset++] = temp;
+                  } else
+                     finalY[offset++] = 0;
+               }
+            }
+            if (f->valid_bits == INVALID_BITS) goto error; // behavior according to spec
+            step2_flag[0] = step2_flag[1] = 1;
+            for (j=2; j < g->values; ++j) {
+               int low, high, pred, highroom, lowroom, room, val;
+               low = g->neighbors[j][0];
+               high = g->neighbors[j][1];
+               //neighbors(g->Xlist, j, &low, &high);
+               pred = predict_point(g->Xlist[j], g->Xlist[low], g->Xlist[high], finalY[low], finalY[high]);
+               val = finalY[j];
+               highroom = range - pred;
+               lowroom = pred;
+               if (highroom < lowroom)
+                  room = highroom * 2;
+               else
+                  room = lowroom * 2;
+               if (val) {
+                  step2_flag[low] = step2_flag[high] = 1;
+                  step2_flag[j] = 1;
+                  if (val >= room)
+                     if (highroom > lowroom)
+                        finalY[j] = val - lowroom + pred;
+                     else
+                        finalY[j] = pred - val + highroom - 1;
+                  else
+                     if (val & 1)
+                        finalY[j] = pred - ((val+1)>>1);
+                     else
+                        finalY[j] = pred + (val>>1);
+               } else {
+                  step2_flag[j] = 0;
+                  finalY[j] = pred;
+               }
+            }
+
+#ifdef STB_VORBIS_NO_DEFER_FLOOR
+            do_floor(f, map, i, n, f->floor_buffers[i], finalY, step2_flag);
+#else
+            // defer final floor computation until _after_ residue
+            for (j=0; j < g->values; ++j) {
+               if (!step2_flag[j])
+                  finalY[j] = -1;
+            }
+#endif
+         } else {
+           error:
+            zero_channel[i] = TRUE;
+         }
+         // So we just defer everything else to later
+
+         // at this point we've decoded the floor into buffer
+      }
+   }
+   CHECK(f);
+   // at this point we've decoded all floors
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+
+   // re-enable coupled channels if necessary
+   memcpy(really_zero_channel, zero_channel, sizeof(really_zero_channel[0]) * f->channels);
+   for (i=0; i < map->coupling_steps; ++i)
+      if (!zero_channel[map->chan[i].magnitude] || !zero_channel[map->chan[i].angle]) {
+         zero_channel[map->chan[i].magnitude] = zero_channel[map->chan[i].angle] = FALSE;
+      }
+
+   CHECK(f);
+// RESIDUE DECODE
+   for (i=0; i < map->submaps; ++i) {
+      float *residue_buffers[STB_VORBIS_MAX_CHANNELS];
+      int r;
+      uint8 do_not_decode[256];
+      int ch = 0;
+      for (j=0; j < f->channels; ++j) {
+         if (map->chan[j].mux == i) {
+            if (zero_channel[j]) {
+               do_not_decode[ch] = TRUE;
+               residue_buffers[ch] = NULL;
+            } else {
+               do_not_decode[ch] = FALSE;
+               residue_buffers[ch] = f->channel_buffers[j];
+            }
+            ++ch;
+         }
+      }
+      r = map->submap_residue[i];
+      decode_residue(f, residue_buffers, ch, n2, r, do_not_decode);
+   }
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+   CHECK(f);
+
+// INVERSE COUPLING
+   for (i = map->coupling_steps-1; i >= 0; --i) {
+      int n2 = n >> 1;
+      float *m = f->channel_buffers[map->chan[i].magnitude];
+      float *a = f->channel_buffers[map->chan[i].angle    ];
+      for (j=0; j < n2; ++j) {
+         float a2,m2;
+         if (m[j] > 0)
+            if (a[j] > 0)
+               m2 = m[j], a2 = m[j] - a[j];
+            else
+               a2 = m[j], m2 = m[j] + a[j];
+         else
+            if (a[j] > 0)
+               m2 = m[j], a2 = m[j] + a[j];
+            else
+               a2 = m[j], m2 = m[j] - a[j];
+         m[j] = m2;
+         a[j] = a2;
+      }
+   }
+   CHECK(f);
+
+   // finish decoding the floors
+#ifndef STB_VORBIS_NO_DEFER_FLOOR
+   for (i=0; i < f->channels; ++i) {
+      if (really_zero_channel[i]) {
+         memset(f->channel_buffers[i], 0, sizeof(*f->channel_buffers[i]) * n2);
+      } else {
+         do_floor(f, map, i, n, f->channel_buffers[i], f->finalY[i], NULL);
+      }
+   }
+#else
+   for (i=0; i < f->channels; ++i) {
+      if (really_zero_channel[i]) {
+         memset(f->channel_buffers[i], 0, sizeof(*f->channel_buffers[i]) * n2);
+      } else {
+         for (j=0; j < n2; ++j)
+            f->channel_buffers[i][j] *= f->floor_buffers[i][j];
+      }
+   }
+#endif
+
+// INVERSE MDCT
+   CHECK(f);
+   for (i=0; i < f->channels; ++i)
+      inverse_mdct(f->channel_buffers[i], n, f, m->blockflag);
+   CHECK(f);
+
+   // this shouldn't be necessary, unless we exited on an error
+   // and want to flush to get to the next packet
+   flush_packet(f);
+
+   if (f->first_decode) {
+      // assume we start so first non-discarded sample is sample 0
+      // this isn't to spec, but spec would require us to read ahead
+      // and decode the size of all current frames--could be done,
+      // but presumably it's not a commonly used feature
+      f->current_loc = 0u - n2; // start of first frame is positioned for discard (NB this is an intentional unsigned overflow/wrap-around)
+      // we might have to discard samples "from" the next frame too,
+      // if we're lapping a large block then a small at the start?
+      f->discard_samples_deferred = n - right_end;
+      f->current_loc_valid = TRUE;
+      f->first_decode = FALSE;
+   } else if (f->discard_samples_deferred) {
+      if (f->discard_samples_deferred >= right_start - left_start) {
+         f->discard_samples_deferred -= (right_start - left_start);
+         left_start = right_start;
+         *p_left = left_start;
+      } else {
+         left_start += f->discard_samples_deferred;
+         *p_left = left_start;
+         f->discard_samples_deferred = 0;
+      }
+   } else if (f->previous_length == 0 && f->current_loc_valid) {
+      // we're recovering from a seek... that means we're going to discard
+      // the samples from this packet even though we know our position from
+      // the last page header, so we need to update the position based on
+      // the discarded samples here
+      // but wait, the code below is going to add this in itself even
+      // on a discard, so we don't need to do it here...
+   }
+
+   // check if we have ogg information about the sample # for this packet
+   if (f->last_seg_which == f->end_seg_with_known_loc) {
+      // if we have a valid current loc, and this is final:
+      if (f->current_loc_valid && (f->page_flag & PAGEFLAG_last_page)) {
+         uint32 current_end = f->known_loc_for_packet;
+         // then let's infer the size of the (probably) short final frame
+         if (current_end < f->current_loc + (right_end-left_start)) {
+            if (current_end < f->current_loc) {
+               // negative truncation, that's impossible!
+               *len = 0;
+            } else {
+               *len = current_end - f->current_loc;
+            }
+            *len += left_start; // this doesn't seem right, but has no ill effect on my test files
+            if (*len > right_end) *len = right_end; // this should never happen
+            f->current_loc += *len;
+            return TRUE;
+         }
+      }
+      // otherwise, just set our sample loc
+      // guess that the ogg granule pos refers to the _middle_ of the
+      // last frame?
+      // set f->current_loc to the position of left_start
+      f->current_loc = f->known_loc_for_packet - (n2-left_start);
+      f->current_loc_valid = TRUE;
+   }
+   if (f->current_loc_valid)
+      f->current_loc += (right_start - left_start);
+
+   if (f->alloc.alloc_buffer)
+      assert(f->alloc.alloc_buffer_length_in_bytes == f->temp_offset);
+   *len = right_end;  // ignore samples after the window goes to 0
+   CHECK(f);
+
+   return TRUE;
+}
+
+static int vorbis_decode_packet(vorb *f, int *len, int *p_left, int *p_right)
+{
+   int mode, left_end, right_end;
+   if (!vorbis_decode_initial(f, p_left, &left_end, p_right, &right_end, &mode)) return 0;
+   return vorbis_decode_packet_rest(f, len, f->mode_config + mode, *p_left, left_end, *p_right, right_end, p_left);
+}
+
+static int vorbis_finish_frame(stb_vorbis *f, int len, int left, int right)
+{
+   int prev,i,j;
+   // we use right&left (the start of the right- and left-window sin()-regions)
+   // to determine how much to return, rather than inferring from the rules
+   // (same result, clearer code); 'left' indicates where our sin() window
+   // starts, therefore where the previous window's right edge starts, and
+   // therefore where to start mixing from the previous buffer. 'right'
+   // indicates where our sin() ending-window starts, therefore that's where
+   // we start saving, and where our returned-data ends.
+
+   // mixin from previous window
+   if (f->previous_length) {
+      int i,j, n = f->previous_length;
+      float *w = get_window(f, n);
+      if (w == NULL) return 0;
+      for (i=0; i < f->channels; ++i) {
+         for (j=0; j < n; ++j)
+            f->channel_buffers[i][left+j] =
+               f->channel_buffers[i][left+j]*w[    j] +
+               f->previous_window[i][     j]*w[n-1-j];
+      }
+   }
+
+   prev = f->previous_length;
+
+   // last half of this data becomes previous window
+   f->previous_length = len - right;
+
+   // @OPTIMIZE: could avoid this copy by double-buffering the
+   // output (flipping previous_window with channel_buffers), but
+   // then previous_window would have to be 2x as large, and
+   // channel_buffers couldn't be temp mem (although they're NOT
+   // currently temp mem, they could be (unless we want to level
+   // performance by spreading out the computation))
+   for (i=0; i < f->channels; ++i)
+      for (j=0; right+j < len; ++j)
+         f->previous_window[i][j] = f->channel_buffers[i][right+j];
+
+   if (!prev)
+      // there was no previous packet, so this data isn't valid...
+      // this isn't entirely true, only the would-have-overlapped data
+      // isn't valid, but this seems to be what the spec requires
+      return 0;
+
+   // truncate a short frame
+   if (len < right) right = len;
+
+   f->samples_output += right-left;
+
+   return right - left;
+}
+
+static int vorbis_pump_first_frame(stb_vorbis *f)
+{
+   int len, right, left, res;
+   res = vorbis_decode_packet(f, &len, &left, &right);
+   if (res)
+      vorbis_finish_frame(f, len, left, right);
+   return res;
+}
+
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+static int is_whole_packet_present(stb_vorbis *f)
+{
+   // make sure that we have the packet available before continuing...
+   // this requires a full ogg parse, but we know we can fetch from f->stream
+
+   // instead of coding this out explicitly, we could save the current read state,
+   // read the next packet with get8() until end-of-packet, check f->eof, then
+   // reset the state? but that would be slower, esp. since we'd have over 256 bytes
+   // of state to restore (primarily the page segment table)
+
+   int s = f->next_seg, first = TRUE;
+   uint8 *p = f->stream;
+
+   if (s != -1) { // if we're not starting the packet with a 'continue on next page' flag
+      for (; s < f->segment_count; ++s) {
+         p += f->segments[s];
+         if (f->segments[s] < 255)               // stop at first short segment
+            break;
+      }
+      // either this continues, or it ends it...
+      if (s == f->segment_count)
+         s = -1; // set 'crosses page' flag
+      if (p > f->stream_end)                     return error(f, VORBIS_need_more_data);
+      first = FALSE;
+   }
+   for (; s == -1;) {
+      uint8 *q;
+      int n;
+
+      // check that we have the page header ready
+      if (p + 26 >= f->stream_end)               return error(f, VORBIS_need_more_data);
+      // validate the page
+      if (memcmp(p, ogg_page_header, 4))         return error(f, VORBIS_invalid_stream);
+      if (p[4] != 0)                             return error(f, VORBIS_invalid_stream);
+      if (first) { // the first segment must NOT have 'continued_packet', later ones MUST
+         if (f->previous_length)
+            if ((p[5] & PAGEFLAG_continued_packet))  return error(f, VORBIS_invalid_stream);
+         // if no previous length, we're resynching, so we can come in on a continued-packet,
+         // which we'll just drop
+      } else {
+         if (!(p[5] & PAGEFLAG_continued_packet)) return error(f, VORBIS_invalid_stream);
+      }
+      n = p[26]; // segment counts
+      q = p+27;  // q points to segment table
+      p = q + n; // advance past header
+      // make sure we've read the segment table
+      if (p > f->stream_end)                     return error(f, VORBIS_need_more_data);
+      for (s=0; s < n; ++s) {
+         p += q[s];
+         if (q[s] < 255)
+            break;
+      }
+      if (s == n)
+         s = -1; // set 'crosses page' flag
+      if (p > f->stream_end)                     return error(f, VORBIS_need_more_data);
+      first = FALSE;
+   }
+   return TRUE;
+}
+#endif // !STB_VORBIS_NO_PUSHDATA_API
+
+static int start_decoder(vorb *f)
+{
+   uint8 header[6], x,y;
+   int len,i,j,k, max_submaps = 0;
+   int longest_floorlist=0;
+
+   // first page, first packet
+   f->first_decode = TRUE;
+
+   if (!start_page(f))                              return FALSE;
+   // validate page flag
+   if (!(f->page_flag & PAGEFLAG_first_page))       return error(f, VORBIS_invalid_first_page);
+   if (f->page_flag & PAGEFLAG_last_page)           return error(f, VORBIS_invalid_first_page);
+   if (f->page_flag & PAGEFLAG_continued_packet)    return error(f, VORBIS_invalid_first_page);
+   // check for expected packet length
+   if (f->segment_count != 1)                       return error(f, VORBIS_invalid_first_page);
+   if (f->segments[0] != 30) {
+      // check for the Ogg skeleton fishead identifying header to refine our error
+      if (f->segments[0] == 64 &&
+          getn(f, header, 6) &&
+          header[0] == 'f' &&
+          header[1] == 'i' &&
+          header[2] == 's' &&
+          header[3] == 'h' &&
+          header[4] == 'e' &&
+          header[5] == 'a' &&
+          get8(f)   == 'd' &&
+          get8(f)   == '\0')                        return error(f, VORBIS_ogg_skeleton_not_supported);
+      else
+                                                    return error(f, VORBIS_invalid_first_page);
+   }
+
+   // read packet
+   // check packet header
+   if (get8(f) != VORBIS_packet_id)                 return error(f, VORBIS_invalid_first_page);
+   if (!getn(f, header, 6))                         return error(f, VORBIS_unexpected_eof);
+   if (!vorbis_validate(header))                    return error(f, VORBIS_invalid_first_page);
+   // vorbis_version
+   if (get32(f) != 0)                               return error(f, VORBIS_invalid_first_page);
+   f->channels = get8(f); if (!f->channels)         return error(f, VORBIS_invalid_first_page);
+   if (f->channels > STB_VORBIS_MAX_CHANNELS)       return error(f, VORBIS_too_many_channels);
+   f->sample_rate = get32(f); if (!f->sample_rate)  return error(f, VORBIS_invalid_first_page);
+   get32(f); // bitrate_maximum
+   get32(f); // bitrate_nominal
+   get32(f); // bitrate_minimum
+   x = get8(f);
+   {
+      int log0,log1;
+      log0 = x & 15;
+      log1 = x >> 4;
+      f->blocksize_0 = 1 << log0;
+      f->blocksize_1 = 1 << log1;
+      if (log0 < 6 || log0 > 13)                       return error(f, VORBIS_invalid_setup);
+      if (log1 < 6 || log1 > 13)                       return error(f, VORBIS_invalid_setup);
+      if (log0 > log1)                                 return error(f, VORBIS_invalid_setup);
+   }
+
+   // framing_flag
+   x = get8(f);
+   if (!(x & 1))                                    return error(f, VORBIS_invalid_first_page);
+
+   // second packet!
+   if (!start_page(f))                              return FALSE;
+
+   if (!start_packet(f))                            return FALSE;
+
+   if (!next_segment(f))                            return FALSE;
+
+   if (get8_packet(f) != VORBIS_packet_comment)            return error(f, VORBIS_invalid_setup);
+   for (i=0; i < 6; ++i) header[i] = get8_packet(f);
+   if (!vorbis_validate(header))                    return error(f, VORBIS_invalid_setup);
+   //file vendor
+   len = get32_packet(f);
+   f->vendor = (char*)setup_malloc(f, sizeof(char) * (len+1));
+   if (f->vendor == NULL)                           return error(f, VORBIS_outofmem);
+   for(i=0; i < len; ++i) {
+      f->vendor[i] = get8_packet(f);
+   }
+   f->vendor[len] = (char)'\0';
+   //user comments
+   f->comment_list_length = get32_packet(f);
+   f->comment_list = NULL;
+   if (f->comment_list_length > 0)
+   {
+      f->comment_list = (char**) setup_malloc(f, sizeof(char*) * (f->comment_list_length));
+      if (f->comment_list == NULL)                  return error(f, VORBIS_outofmem);
+   }
+
+   for(i=0; i < f->comment_list_length; ++i) {
+      len = get32_packet(f);
+      f->comment_list[i] = (char*)setup_malloc(f, sizeof(char) * (len+1));
+      if (f->comment_list[i] == NULL)               return error(f, VORBIS_outofmem);
+
+      for(j=0; j < len; ++j) {
+         f->comment_list[i][j] = get8_packet(f);
+      }
+      f->comment_list[i][len] = (char)'\0';
+   }
+
+   // framing_flag
+   x = get8_packet(f);
+   if (!(x & 1))                                    return error(f, VORBIS_invalid_setup);
+
+
+   skip(f, f->bytes_in_seg);
+   f->bytes_in_seg = 0;
+
+   do {
+      len = next_segment(f);
+      skip(f, len);
+      f->bytes_in_seg = 0;
+   } while (len);
+
+   // third packet!
+   if (!start_packet(f))                            return FALSE;
+
+   #ifndef STB_VORBIS_NO_PUSHDATA_API
+   if (IS_PUSH_MODE(f)) {
+      if (!is_whole_packet_present(f)) {
+         // convert error in ogg header to write type
+         if (f->error == VORBIS_invalid_stream)
+            f->error = VORBIS_invalid_setup;
+         return FALSE;
+      }
+   }
+   #endif
+
+   crc32_init(); // always init it, to avoid multithread race conditions
+
+   if (get8_packet(f) != VORBIS_packet_setup)       return error(f, VORBIS_invalid_setup);
+   for (i=0; i < 6; ++i) header[i] = get8_packet(f);
+   if (!vorbis_validate(header))                    return error(f, VORBIS_invalid_setup);
+
+   // codebooks
+
+   f->codebook_count = get_bits(f,8) + 1;
+   f->codebooks = (Codebook *) setup_malloc(f, sizeof(*f->codebooks) * f->codebook_count);
+   if (f->codebooks == NULL)                        return error(f, VORBIS_outofmem);
+   memset(f->codebooks, 0, sizeof(*f->codebooks) * f->codebook_count);
+   for (i=0; i < f->codebook_count; ++i) {
+      uint32 *values;
+      int ordered, sorted_count;
+      int total=0;
+      uint8 *lengths;
+      Codebook *c = f->codebooks+i;
+      CHECK(f);
+      x = get_bits(f, 8); if (x != 0x42)            return error(f, VORBIS_invalid_setup);
+      x = get_bits(f, 8); if (x != 0x43)            return error(f, VORBIS_invalid_setup);
+      x = get_bits(f, 8); if (x != 0x56)            return error(f, VORBIS_invalid_setup);
+      x = get_bits(f, 8);
+      c->dimensions = (get_bits(f, 8)<<8) + x;
+      x = get_bits(f, 8);
+      y = get_bits(f, 8);
+      c->entries = (get_bits(f, 8)<<16) + (y<<8) + x;
+      ordered = get_bits(f,1);
+      c->sparse = ordered ? 0 : get_bits(f,1);
+
+      if (c->dimensions == 0 && c->entries != 0)    return error(f, VORBIS_invalid_setup);
+
+      if (c->sparse)
+         lengths = (uint8 *) setup_temp_malloc(f, c->entries);
+      else
+         lengths = c->codeword_lengths = (uint8 *) setup_malloc(f, c->entries);
+
+      if (!lengths) return error(f, VORBIS_outofmem);
+
+      if (ordered) {
+         int current_entry = 0;
+         int current_length = get_bits(f,5) + 1;
+         while (current_entry < c->entries) {
+            int limit = c->entries - current_entry;
+            int n = get_bits(f, ilog(limit));
+            if (current_length >= 32) return error(f, VORBIS_invalid_setup);
+            if (current_entry + n > (int) c->entries) { return error(f, VORBIS_invalid_setup); }
+            memset(lengths + current_entry, current_length, n);
+            current_entry += n;
+            ++current_length;
+         }
+      } else {
+         for (j=0; j < c->entries; ++j) {
+            int present = c->sparse ? get_bits(f,1) : 1;
+            if (present) {
+               lengths[j] = get_bits(f, 5) + 1;
+               ++total;
+               if (lengths[j] == 32)
+                  return error(f, VORBIS_invalid_setup);
+            } else {
+               lengths[j] = NO_CODE;
+            }
+         }
+      }
+
+      if (c->sparse && total >= c->entries >> 2) {
+         // convert sparse items to non-sparse!
+         if (c->entries > (int) f->setup_temp_memory_required)
+            f->setup_temp_memory_required = c->entries;
+
+         c->codeword_lengths = (uint8 *) setup_malloc(f, c->entries);
+         if (c->codeword_lengths == NULL) return error(f, VORBIS_outofmem);
+         memcpy(c->codeword_lengths, lengths, c->entries);
+         setup_temp_free(f, lengths, c->entries); // note this is only safe if there have been no intervening temp mallocs!
+         lengths = c->codeword_lengths;
+         c->sparse = 0;
+      }
+
+      // compute the size of the sorted tables
+      if (c->sparse) {
+         sorted_count = total;
+      } else {
+         sorted_count = 0;
+         #ifndef STB_VORBIS_NO_HUFFMAN_BINARY_SEARCH
+         for (j=0; j < c->entries; ++j)
+            if (lengths[j] > STB_VORBIS_FAST_HUFFMAN_LENGTH && lengths[j] != NO_CODE)
+               ++sorted_count;
+         #endif
+      }
+
+      c->sorted_entries = sorted_count;
+      values = NULL;
+
+      CHECK(f);
+      if (!c->sparse) {
+         c->codewords = (uint32 *) setup_malloc(f, sizeof(c->codewords[0]) * c->entries);
+         if (!c->codewords)                  return error(f, VORBIS_outofmem);
+      } else {
+         unsigned int size;
+         if (c->sorted_entries) {
+            c->codeword_lengths = (uint8 *) setup_malloc(f, c->sorted_entries);
+            if (!c->codeword_lengths)           return error(f, VORBIS_outofmem);
+            c->codewords = (uint32 *) setup_temp_malloc(f, sizeof(*c->codewords) * c->sorted_entries);
+            if (!c->codewords)                  return error(f, VORBIS_outofmem);
+            values = (uint32 *) setup_temp_malloc(f, sizeof(*values) * c->sorted_entries);
+            if (!values)                        return error(f, VORBIS_outofmem);
+         }
+         size = c->entries + (sizeof(*c->codewords) + sizeof(*values)) * c->sorted_entries;
+         if (size > f->setup_temp_memory_required)
+            f->setup_temp_memory_required = size;
+      }
+
+      if (!compute_codewords(c, lengths, c->entries, values)) {
+         if (c->sparse) setup_temp_free(f, values, 0);
+         return error(f, VORBIS_invalid_setup);
+      }
+
+      if (c->sorted_entries) {
+         // allocate an extra slot for sentinels
+         c->sorted_codewords = (uint32 *) setup_malloc(f, sizeof(*c->sorted_codewords) * (c->sorted_entries+1));
+         if (c->sorted_codewords == NULL) return error(f, VORBIS_outofmem);
+         // allocate an extra slot at the front so that c->sorted_values[-1] is defined
+         // so that we can catch that case without an extra if
+         c->sorted_values    = ( int   *) setup_malloc(f, sizeof(*c->sorted_values   ) * (c->sorted_entries+1));
+         if (c->sorted_values == NULL) return error(f, VORBIS_outofmem);
+         ++c->sorted_values;
+         c->sorted_values[-1] = -1;
+         compute_sorted_huffman(c, lengths, values);
+      }
+
+      if (c->sparse) {
+         setup_temp_free(f, values, sizeof(*values)*c->sorted_entries);
+         setup_temp_free(f, c->codewords, sizeof(*c->codewords)*c->sorted_entries);
+         setup_temp_free(f, lengths, c->entries);
+         c->codewords = NULL;
+      }
+
+      compute_accelerated_huffman(c);
+
+      CHECK(f);
+      c->lookup_type = get_bits(f, 4);
+      if (c->lookup_type > 2) return error(f, VORBIS_invalid_setup);
+      if (c->lookup_type > 0) {
+         uint16 *mults;
+         c->minimum_value = float32_unpack(get_bits(f, 32));
+         c->delta_value = float32_unpack(get_bits(f, 32));
+         c->value_bits = get_bits(f, 4)+1;
+         c->sequence_p = get_bits(f,1);
+         if (c->lookup_type == 1) {
+            int values = lookup1_values(c->entries, c->dimensions);
+            if (values < 0) return error(f, VORBIS_invalid_setup);
+            c->lookup_values = (uint32) values;
+         } else {
+            c->lookup_values = c->entries * c->dimensions;
+         }
+         if (c->lookup_values == 0) return error(f, VORBIS_invalid_setup);
+         mults = (uint16 *) setup_temp_malloc(f, sizeof(mults[0]) * c->lookup_values);
+         if (mults == NULL) return error(f, VORBIS_outofmem);
+         for (j=0; j < (int) c->lookup_values; ++j) {
+            int q = get_bits(f, c->value_bits);
+            if (q == EOP) { setup_temp_free(f,mults,sizeof(mults[0])*c->lookup_values); return error(f, VORBIS_invalid_setup); }
+            mults[j] = q;
+         }
+
+#ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+         if (c->lookup_type == 1) {
+            int len, sparse = c->sparse;
+            float last=0;
+            // pre-expand the lookup1-style multiplicands, to avoid a divide in the inner loop
+            if (sparse) {
+               if (c->sorted_entries == 0) goto skip;
+               c->multiplicands = (codetype *) setup_malloc(f, sizeof(c->multiplicands[0]) * c->sorted_entries * c->dimensions);
+            } else
+               c->multiplicands = (codetype *) setup_malloc(f, sizeof(c->multiplicands[0]) * c->entries        * c->dimensions);
+            if (c->multiplicands == NULL) { setup_temp_free(f,mults,sizeof(mults[0])*c->lookup_values); return error(f, VORBIS_outofmem); }
+            len = sparse ? c->sorted_entries : c->entries;
+            for (j=0; j < len; ++j) {
+               unsigned int z = sparse ? c->sorted_values[j] : j;
+               unsigned int div=1;
+               for (k=0; k < c->dimensions; ++k) {
+                  int off = (z / div) % c->lookup_values;
+                  float val = mults[off]*c->delta_value + c->minimum_value + last;
+                  c->multiplicands[j*c->dimensions + k] = val;
+                  if (c->sequence_p)
+                     last = val;
+                  if (k+1 < c->dimensions) {
+                     if (div > UINT_MAX / (unsigned int) c->lookup_values) {
+                        setup_temp_free(f, mults,sizeof(mults[0])*c->lookup_values);
+                        return error(f, VORBIS_invalid_setup);
+                     }
+                     div *= c->lookup_values;
+                  }
+               }
+            }
+            c->lookup_type = 2;
+         }
+         else
+#endif
+         {
+            float last=0;
+            CHECK(f);
+            c->multiplicands = (codetype *) setup_malloc(f, sizeof(c->multiplicands[0]) * c->lookup_values);
+            if (c->multiplicands == NULL) { setup_temp_free(f, mults,sizeof(mults[0])*c->lookup_values); return error(f, VORBIS_outofmem); }
+            for (j=0; j < (int) c->lookup_values; ++j) {
+               float val = mults[j] * c->delta_value + c->minimum_value + last;
+               c->multiplicands[j] = val;
+               if (c->sequence_p)
+                  last = val;
+            }
+         }
+#ifndef STB_VORBIS_DIVIDES_IN_CODEBOOK
+        skip:;
+#endif
+         setup_temp_free(f, mults, sizeof(mults[0])*c->lookup_values);
+
+         CHECK(f);
+      }
+      CHECK(f);
+   }
+
+   // time domain transfers (notused)
+
+   x = get_bits(f, 6) + 1;
+   for (i=0; i < x; ++i) {
+      uint32 z = get_bits(f, 16);
+      if (z != 0) return error(f, VORBIS_invalid_setup);
+   }
+
+   // Floors
+   f->floor_count = get_bits(f, 6)+1;
+   f->floor_config = (Floor *)  setup_malloc(f, f->floor_count * sizeof(*f->floor_config));
+   if (f->floor_config == NULL) return error(f, VORBIS_outofmem);
+   for (i=0; i < f->floor_count; ++i) {
+      f->floor_types[i] = get_bits(f, 16);
+      if (f->floor_types[i] > 1) return error(f, VORBIS_invalid_setup);
+      if (f->floor_types[i] == 0) {
+         Floor0 *g = &f->floor_config[i].floor0;
+         g->order = get_bits(f,8);
+         g->rate = get_bits(f,16);
+         g->bark_map_size = get_bits(f,16);
+         g->amplitude_bits = get_bits(f,6);
+         g->amplitude_offset = get_bits(f,8);
+         g->number_of_books = get_bits(f,4) + 1;
+         for (j=0; j < g->number_of_books; ++j)
+            g->book_list[j] = get_bits(f,8);
+         return error(f, VORBIS_feature_not_supported);
+      } else {
+         stbv__floor_ordering p[31*8+2];
+         Floor1 *g = &f->floor_config[i].floor1;
+         int max_class = -1;
+         g->partitions = get_bits(f, 5);
+         for (j=0; j < g->partitions; ++j) {
+            g->partition_class_list[j] = get_bits(f, 4);
+            if (g->partition_class_list[j] > max_class)
+               max_class = g->partition_class_list[j];
+         }
+         for (j=0; j <= max_class; ++j) {
+            g->class_dimensions[j] = get_bits(f, 3)+1;
+            g->class_subclasses[j] = get_bits(f, 2);
+            if (g->class_subclasses[j]) {
+               g->class_masterbooks[j] = get_bits(f, 8);
+               if (g->class_masterbooks[j] >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+            }
+            for (k=0; k < 1 << g->class_subclasses[j]; ++k) {
+               g->subclass_books[j][k] = (int16)get_bits(f,8)-1;
+               if (g->subclass_books[j][k] >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+            }
+         }
+         g->floor1_multiplier = get_bits(f,2)+1;
+         g->rangebits = get_bits(f,4);
+         g->Xlist[0] = 0;
+         g->Xlist[1] = 1 << g->rangebits;
+         g->values = 2;
+         for (j=0; j < g->partitions; ++j) {
+            int c = g->partition_class_list[j];
+            for (k=0; k < g->class_dimensions[c]; ++k) {
+               g->Xlist[g->values] = get_bits(f, g->rangebits);
+               ++g->values;
+            }
+         }
+         // precompute the sorting
+         for (j=0; j < g->values; ++j) {
+            p[j].x = g->Xlist[j];
+            p[j].id = j;
+         }
+         qsort(p, g->values, sizeof(p[0]), point_compare);
+         for (j=0; j < g->values-1; ++j)
+            if (p[j].x == p[j+1].x)
+               return error(f, VORBIS_invalid_setup);
+         for (j=0; j < g->values; ++j)
+            g->sorted_order[j] = (uint8) p[j].id;
+         // precompute the neighbors
+         for (j=2; j < g->values; ++j) {
+            int low = 0,hi = 0;
+            neighbors(g->Xlist, j, &low,&hi);
+            g->neighbors[j][0] = low;
+            g->neighbors[j][1] = hi;
+         }
+
+         if (g->values > longest_floorlist)
+            longest_floorlist = g->values;
+      }
+   }
+
+   // Residue
+   f->residue_count = get_bits(f, 6)+1;
+   f->residue_config = (Residue *) setup_malloc(f, f->residue_count * sizeof(f->residue_config[0]));
+   if (f->residue_config == NULL) return error(f, VORBIS_outofmem);
+   memset(f->residue_config, 0, f->residue_count * sizeof(f->residue_config[0]));
+   for (i=0; i < f->residue_count; ++i) {
+      uint8 residue_cascade[64];
+      Residue *r = f->residue_config+i;
+      f->residue_types[i] = get_bits(f, 16);
+      if (f->residue_types[i] > 2) return error(f, VORBIS_invalid_setup);
+      r->begin = get_bits(f, 24);
+      r->end = get_bits(f, 24);
+      if (r->end < r->begin) return error(f, VORBIS_invalid_setup);
+      r->part_size = get_bits(f,24)+1;
+      r->classifications = get_bits(f,6)+1;
+      r->classbook = get_bits(f,8);
+      if (r->classbook >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+      for (j=0; j < r->classifications; ++j) {
+         uint8 high_bits=0;
+         uint8 low_bits=get_bits(f,3);
+         if (get_bits(f,1))
+            high_bits = get_bits(f,5);
+         residue_cascade[j] = high_bits*8 + low_bits;
+      }
+      r->residue_books = (short (*)[8]) setup_malloc(f, sizeof(r->residue_books[0]) * r->classifications);
+      if (r->residue_books == NULL) return error(f, VORBIS_outofmem);
+      for (j=0; j < r->classifications; ++j) {
+         for (k=0; k < 8; ++k) {
+            if (residue_cascade[j] & (1 << k)) {
+               r->residue_books[j][k] = get_bits(f, 8);
+               if (r->residue_books[j][k] >= f->codebook_count) return error(f, VORBIS_invalid_setup);
+            } else {
+               r->residue_books[j][k] = -1;
+            }
+         }
+      }
+      // precompute the classifications[] array to avoid inner-loop mod/divide
+      // call it 'classdata' since we already have r->classifications
+      r->classdata = (uint8 **) setup_malloc(f, sizeof(*r->classdata) * f->codebooks[r->classbook].entries);
+      if (!r->classdata) return error(f, VORBIS_outofmem);
+      memset(r->classdata, 0, sizeof(*r->classdata) * f->codebooks[r->classbook].entries);
+      for (j=0; j < f->codebooks[r->classbook].entries; ++j) {
+         int classwords = f->codebooks[r->classbook].dimensions;
+         int temp = j;
+         r->classdata[j] = (uint8 *) setup_malloc(f, sizeof(r->classdata[j][0]) * classwords);
+         if (r->classdata[j] == NULL) return error(f, VORBIS_outofmem);
+         for (k=classwords-1; k >= 0; --k) {
+            r->classdata[j][k] = temp % r->classifications;
+            temp /= r->classifications;
+         }
+      }
+   }
+
+   f->mapping_count = get_bits(f,6)+1;
+   f->mapping = (Mapping *) setup_malloc(f, f->mapping_count * sizeof(*f->mapping));
+   if (f->mapping == NULL) return error(f, VORBIS_outofmem);
+   memset(f->mapping, 0, f->mapping_count * sizeof(*f->mapping));
+   for (i=0; i < f->mapping_count; ++i) {
+      Mapping *m = f->mapping + i;
+      int mapping_type = get_bits(f,16);
+      if (mapping_type != 0) return error(f, VORBIS_invalid_setup);
+      m->chan = (MappingChannel *) setup_malloc(f, f->channels * sizeof(*m->chan));
+      if (m->chan == NULL) return error(f, VORBIS_outofmem);
+      if (get_bits(f,1))
+         m->submaps = get_bits(f,4)+1;
+      else
+         m->submaps = 1;
+      if (m->submaps > max_submaps)
+         max_submaps = m->submaps;
+      if (get_bits(f,1)) {
+         m->coupling_steps = get_bits(f,8)+1;
+         if (m->coupling_steps > f->channels) return error(f, VORBIS_invalid_setup);
+         for (k=0; k < m->coupling_steps; ++k) {
+            m->chan[k].magnitude = get_bits(f, ilog(f->channels-1));
+            m->chan[k].angle = get_bits(f, ilog(f->channels-1));
+            if (m->chan[k].magnitude >= f->channels)        return error(f, VORBIS_invalid_setup);
+            if (m->chan[k].angle     >= f->channels)        return error(f, VORBIS_invalid_setup);
+            if (m->chan[k].magnitude == m->chan[k].angle)   return error(f, VORBIS_invalid_setup);
+         }
+      } else
+         m->coupling_steps = 0;
+
+      // reserved field
+      if (get_bits(f,2)) return error(f, VORBIS_invalid_setup);
+      if (m->submaps > 1) {
+         for (j=0; j < f->channels; ++j) {
+            m->chan[j].mux = get_bits(f, 4);
+            if (m->chan[j].mux >= m->submaps)                return error(f, VORBIS_invalid_setup);
+         }
+      } else
+         // @SPECIFICATION: this case is missing from the spec
+         for (j=0; j < f->channels; ++j)
+            m->chan[j].mux = 0;
+
+      for (j=0; j < m->submaps; ++j) {
+         get_bits(f,8); // discard
+         m->submap_floor[j] = get_bits(f,8);
+         m->submap_residue[j] = get_bits(f,8);
+         if (m->submap_floor[j] >= f->floor_count)      return error(f, VORBIS_invalid_setup);
+         if (m->submap_residue[j] >= f->residue_count)  return error(f, VORBIS_invalid_setup);
+      }
+   }
+
+   // Modes
+   f->mode_count = get_bits(f, 6)+1;
+   for (i=0; i < f->mode_count; ++i) {
+      Mode *m = f->mode_config+i;
+      m->blockflag = get_bits(f,1);
+      m->windowtype = get_bits(f,16);
+      m->transformtype = get_bits(f,16);
+      m->mapping = get_bits(f,8);
+      if (m->windowtype != 0)                 return error(f, VORBIS_invalid_setup);
+      if (m->transformtype != 0)              return error(f, VORBIS_invalid_setup);
+      if (m->mapping >= f->mapping_count)     return error(f, VORBIS_invalid_setup);
+   }
+
+   flush_packet(f);
+
+   f->previous_length = 0;
+
+   for (i=0; i < f->channels; ++i) {
+      f->channel_buffers[i] = (float *) setup_malloc(f, sizeof(float) * f->blocksize_1);
+      f->previous_window[i] = (float *) setup_malloc(f, sizeof(float) * f->blocksize_1/2);
+      f->finalY[i]          = (int16 *) setup_malloc(f, sizeof(int16) * longest_floorlist);
+      if (f->channel_buffers[i] == NULL || f->previous_window[i] == NULL || f->finalY[i] == NULL) return error(f, VORBIS_outofmem);
+      memset(f->channel_buffers[i], 0, sizeof(float) * f->blocksize_1);
+      #ifdef STB_VORBIS_NO_DEFER_FLOOR
+      f->floor_buffers[i]   = (float *) setup_malloc(f, sizeof(float) * f->blocksize_1/2);
+      if (f->floor_buffers[i] == NULL) return error(f, VORBIS_outofmem);
+      #endif
+   }
+
+   if (!init_blocksize(f, 0, f->blocksize_0)) return FALSE;
+   if (!init_blocksize(f, 1, f->blocksize_1)) return FALSE;
+   f->blocksize[0] = f->blocksize_0;
+   f->blocksize[1] = f->blocksize_1;
+
+#ifdef STB_VORBIS_DIVIDE_TABLE
+   if (integer_divide_table[1][1]==0)
+      for (i=0; i < DIVTAB_NUMER; ++i)
+         for (j=1; j < DIVTAB_DENOM; ++j)
+            integer_divide_table[i][j] = i / j;
+#endif
+
+   // compute how much temporary memory is needed
+
+   // 1.
+   {
+      uint32 imdct_mem = (f->blocksize_1 * sizeof(float) >> 1);
+      uint32 classify_mem;
+      int i,max_part_read=0;
+      for (i=0; i < f->residue_count; ++i) {
+         Residue *r = f->residue_config + i;
+         unsigned int actual_size = f->blocksize_1 / 2;
+         unsigned int limit_r_begin = r->begin < actual_size ? r->begin : actual_size;
+         unsigned int limit_r_end   = r->end   < actual_size ? r->end   : actual_size;
+         int n_read = limit_r_end - limit_r_begin;
+         int part_read = n_read / r->part_size;
+         if (part_read > max_part_read)
+            max_part_read = part_read;
+      }
+      #ifndef STB_VORBIS_DIVIDES_IN_RESIDUE
+      classify_mem = f->channels * (sizeof(void*) + max_part_read * sizeof(uint8 *));
+      #else
+      classify_mem = f->channels * (sizeof(void*) + max_part_read * sizeof(int *));
+      #endif
+
+      // maximum reasonable partition size is f->blocksize_1
+
+      f->temp_memory_required = classify_mem;
+      if (imdct_mem > f->temp_memory_required)
+         f->temp_memory_required = imdct_mem;
+   }
+
+
+   if (f->alloc.alloc_buffer) {
+      assert(f->temp_offset == f->alloc.alloc_buffer_length_in_bytes);
+      // check if there's enough temp memory so we don't error later
+      if (f->setup_offset + sizeof(*f) + f->temp_memory_required > (unsigned) f->temp_offset)
+         return error(f, VORBIS_outofmem);
+   }
+
+   // @TODO: stb_vorbis_seek_start expects first_audio_page_offset to point to a page
+   // without PAGEFLAG_continued_packet, so this either points to the first page, or
+   // the page after the end of the headers. It might be cleaner to point to a page
+   // in the middle of the headers, when that's the page where the first audio packet
+   // starts, but we'd have to also correctly skip the end of any continued packet in
+   // stb_vorbis_seek_start.
+   if (f->next_seg == -1) {
+      f->first_audio_page_offset = stb_vorbis_get_file_offset(f);
+   } else {
+      f->first_audio_page_offset = 0;
+   }
+
+   return TRUE;
+}
+
+static void vorbis_deinit(stb_vorbis *p)
+{
+   int i,j;
+
+   setup_free(p, p->vendor);
+   for (i=0; i < p->comment_list_length; ++i) {
+      setup_free(p, p->comment_list[i]);
+   }
+   setup_free(p, p->comment_list);
+
+   if (p->residue_config) {
+      for (i=0; i < p->residue_count; ++i) {
+         Residue *r = p->residue_config+i;
+         if (r->classdata) {
+            for (j=0; j < p->codebooks[r->classbook].entries; ++j)
+               setup_free(p, r->classdata[j]);
+            setup_free(p, r->classdata);
+         }
+         setup_free(p, r->residue_books);
+      }
+   }
+
+   if (p->codebooks) {
+      CHECK(p);
+      for (i=0; i < p->codebook_count; ++i) {
+         Codebook *c = p->codebooks + i;
+         setup_free(p, c->codeword_lengths);
+         setup_free(p, c->multiplicands);
+         setup_free(p, c->codewords);
+         setup_free(p, c->sorted_codewords);
+         // c->sorted_values[-1] is the first entry in the array
+         setup_free(p, c->sorted_values ? c->sorted_values-1 : NULL);
+      }
+      setup_free(p, p->codebooks);
+   }
+   setup_free(p, p->floor_config);
+   setup_free(p, p->residue_config);
+   if (p->mapping) {
+      for (i=0; i < p->mapping_count; ++i)
+         setup_free(p, p->mapping[i].chan);
+      setup_free(p, p->mapping);
+   }
+   CHECK(p);
+   for (i=0; i < p->channels && i < STB_VORBIS_MAX_CHANNELS; ++i) {
+      setup_free(p, p->channel_buffers[i]);
+      setup_free(p, p->previous_window[i]);
+      #ifdef STB_VORBIS_NO_DEFER_FLOOR
+      setup_free(p, p->floor_buffers[i]);
+      #endif
+      setup_free(p, p->finalY[i]);
+   }
+   for (i=0; i < 2; ++i) {
+      setup_free(p, p->A[i]);
+      setup_free(p, p->B[i]);
+      setup_free(p, p->C[i]);
+      setup_free(p, p->window[i]);
+      setup_free(p, p->bit_reverse[i]);
+   }
+   #ifndef STB_VORBIS_NO_STDIO
+   if (p->close_on_free) fclose(p->f);
+   #endif
+}
+
+void stb_vorbis_close(stb_vorbis *p)
+{
+   if (p == NULL) return;
+   vorbis_deinit(p);
+   setup_free(p,p);
+}
+
+static void vorbis_init(stb_vorbis *p, const stb_vorbis_alloc *z)
+{
+   memset(p, 0, sizeof(*p)); // NULL out all malloc'd pointers to start
+   if (z) {
+      p->alloc = *z;
+      p->alloc.alloc_buffer_length_in_bytes &= ~7;
+      p->temp_offset = p->alloc.alloc_buffer_length_in_bytes;
+   }
+   p->eof = 0;
+   p->error = VORBIS__no_error;
+   p->stream = NULL;
+   p->codebooks = NULL;
+   p->page_crc_tests = -1;
+   #ifndef STB_VORBIS_NO_STDIO
+   p->close_on_free = FALSE;
+   p->f = NULL;
+   #endif
+}
+
+int stb_vorbis_get_sample_offset(stb_vorbis *f)
+{
+   if (f->current_loc_valid)
+      return f->current_loc;
+   else
+      return -1;
+}
+
+stb_vorbis_info stb_vorbis_get_info(stb_vorbis *f)
+{
+   stb_vorbis_info d;
+   d.channels = f->channels;
+   d.sample_rate = f->sample_rate;
+   d.setup_memory_required = f->setup_memory_required;
+   d.setup_temp_memory_required = f->setup_temp_memory_required;
+   d.temp_memory_required = f->temp_memory_required;
+   d.max_frame_size = f->blocksize_1 >> 1;
+   return d;
+}
+
+stb_vorbis_comment stb_vorbis_get_comment(stb_vorbis *f)
+{
+   stb_vorbis_comment d;
+   d.vendor = f->vendor;
+   d.comment_list_length = f->comment_list_length;
+   d.comment_list = f->comment_list;
+   return d;
+}
+
+int stb_vorbis_get_error(stb_vorbis *f)
+{
+   int e = f->error;
+   f->error = VORBIS__no_error;
+   return e;
+}
+
+static stb_vorbis * vorbis_alloc(stb_vorbis *f)
+{
+   stb_vorbis *p = (stb_vorbis *) setup_malloc(f, sizeof(*p));
+   return p;
+}
+
+#ifndef STB_VORBIS_NO_PUSHDATA_API
+
+void stb_vorbis_flush_pushdata(stb_vorbis *f)
+{
+   f->previous_length = 0;
+   f->page_crc_tests  = 0;
+   f->discard_samples_deferred = 0;
+   f->current_loc_valid = FALSE;
+   f->first_decode = FALSE;
+   f->samples_output = 0;
+   f->channel_buffer_start = 0;
+   f->channel_buffer_end = 0;
+}
+
+static int vorbis_search_for_page_pushdata(vorb *f, uint8 *data, int data_len)
+{
+   int i,n;
+   for (i=0; i < f->page_crc_tests; ++i)
+      f->scan[i].bytes_done = 0;
+
+   // if we have room for more scans, search for them first, because
+   // they may cause us to stop early if their header is incomplete
+   if (f->page_crc_tests < STB_VORBIS_PUSHDATA_CRC_COUNT) {
+      if (data_len < 4) return 0;
+      data_len -= 3; // need to look for 4-byte sequence, so don't miss
+                     // one that straddles a boundary
+      for (i=0; i < data_len; ++i) {
+         if (data[i] == 0x4f) {
+            if (0==memcmp(data+i, ogg_page_header, 4)) {
+               int j,len;
+               uint32 crc;
+               // make sure we have the whole page header
+               if (i+26 >= data_len || i+27+data[i+26] >= data_len) {
+                  // only read up to this page start, so hopefully we'll
+                  // have the whole page header start next time
+                  data_len = i;
+                  break;
+               }
+               // ok, we have it all; compute the length of the page
+               len = 27 + data[i+26];
+               for (j=0; j < data[i+26]; ++j)
+                  len += data[i+27+j];
+               // scan everything up to the embedded crc (which we must 0)
+               crc = 0;
+               for (j=0; j < 22; ++j)
+                  crc = crc32_update(crc, data[i+j]);
+               // now process 4 0-bytes
+               for (   ; j < 26; ++j)
+                  crc = crc32_update(crc, 0);
+               // len is the total number of bytes we need to scan
+               n = f->page_crc_tests++;
+               f->scan[n].bytes_left = len-j;
+               f->scan[n].crc_so_far = crc;
+               f->scan[n].goal_crc = data[i+22] + (data[i+23] << 8) + (data[i+24]<<16) + (data[i+25]<<24);
+               // if the last frame on a page is continued to the next, then
+               // we can't recover the sample_loc immediately
+               if (data[i+27+data[i+26]-1] == 255)
+                  f->scan[n].sample_loc = ~0;
+               else
+                  f->scan[n].sample_loc = data[i+6] + (data[i+7] << 8) + (data[i+ 8]<<16) + (data[i+ 9]<<24);
+               f->scan[n].bytes_done = i+j;
+               if (f->page_crc_tests == STB_VORBIS_PUSHDATA_CRC_COUNT)
+                  break;
+               // keep going if we still have room for more
+            }
+         }
+      }
+   }
+
+   for (i=0; i < f->page_crc_tests;) {
+      uint32 crc;
+      int j;
+      int n = f->scan[i].bytes_done;
+      int m = f->scan[i].bytes_left;
+      if (m > data_len - n) m = data_len - n;
+      // m is the bytes to scan in the current chunk
+      crc = f->scan[i].crc_so_far;
+      for (j=0; j < m; ++j)
+         crc = crc32_update(crc, data[n+j]);
+      f->scan[i].bytes_left -= m;
+      f->scan[i].crc_so_far = crc;
+      if (f->scan[i].bytes_left == 0) {
+         // does it match?
+         if (f->scan[i].crc_so_far == f->scan[i].goal_crc) {
+            // Houston, we have page
+            data_len = n+m; // consumption amount is wherever that scan ended
+            f->page_crc_tests = -1; // drop out of page scan mode
+            f->previous_length = 0; // decode-but-don't-output one frame
+            f->next_seg = -1;       // start a new page
+            f->current_loc = f->scan[i].sample_loc; // set the current sample location
+                                    // to the amount we'd have decoded had we decoded this page
+            f->current_loc_valid = f->current_loc != ~0U;
+            return data_len;
+         }
+         // delete entry
+         f->scan[i] = f->scan[--f->page_crc_tests];
+      } else {
+         ++i;
+      }
+   }
+
+   return data_len;
+}
+
+// return value: number of bytes we used
+int stb_vorbis_decode_frame_pushdata(
+         stb_vorbis *f,                   // the file we're decoding
+         const uint8 *data, int data_len, // the memory available for decoding
+         int *channels,                   // place to write number of float * buffers
+         float ***output,                 // place to write float ** array of float * buffers
+         int *samples                     // place to write number of output samples
+     )
+{
+   int i;
+   int len,right,left;
+
+   if (!IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+
+   if (f->page_crc_tests >= 0) {
+      *samples = 0;
+      return vorbis_search_for_page_pushdata(f, (uint8 *) data, data_len);
+   }
+
+   f->stream     = (uint8 *) data;
+   f->stream_end = (uint8 *) data + data_len;
+   f->error      = VORBIS__no_error;
+
+   // check that we have the entire packet in memory
+   if (!is_whole_packet_present(f)) {
+      *samples = 0;
+      return 0;
+   }
+
+   if (!vorbis_decode_packet(f, &len, &left, &right)) {
+      // save the actual error we encountered
+      enum STBVorbisError error = f->error;
+      if (error == VORBIS_bad_packet_type) {
+         // flush and resynch
+         f->error = VORBIS__no_error;
+         while (get8_packet(f) != EOP)
+            if (f->eof) break;
+         *samples = 0;
+         return (int) (f->stream - data);
+      }
+      if (error == VORBIS_continued_packet_flag_invalid) {
+         if (f->previous_length == 0) {
+            // we may be resynching, in which case it's ok to hit one
+            // of these; just discard the packet
+            f->error = VORBIS__no_error;
+            while (get8_packet(f) != EOP)
+               if (f->eof) break;
+            *samples = 0;
+            return (int) (f->stream - data);
+         }
+      }
+      // if we get an error while parsing, what to do?
+      // well, it DEFINITELY won't work to continue from where we are!
+      stb_vorbis_flush_pushdata(f);
+      // restore the error that actually made us bail
+      f->error = error;
+      *samples = 0;
+      return 1;
+   }
+
+   // success!
+   len = vorbis_finish_frame(f, len, left, right);
+   for (i=0; i < f->channels; ++i)
+      f->outputs[i] = f->channel_buffers[i] + left;
+
+   if (channels) *channels = f->channels;
+   *samples = len;
+   *output = f->outputs;
+   return (int) (f->stream - data);
+}
+
+stb_vorbis *stb_vorbis_open_pushdata(
+         const unsigned char *data, int data_len, // the memory available for decoding
+         int *data_used,              // only defined if result is not NULL
+         int *error, const stb_vorbis_alloc *alloc)
+{
+   stb_vorbis *f, p;
+   vorbis_init(&p, alloc);
+   p.stream     = (uint8 *) data;
+   p.stream_end = (uint8 *) data + data_len;
+   p.push_mode  = TRUE;
+   if (!start_decoder(&p)) {
+      if (p.eof)
+         *error = VORBIS_need_more_data;
+      else
+         *error = p.error;
+      vorbis_deinit(&p);
+      return NULL;
+   }
+   f = vorbis_alloc(&p);
+   if (f) {
+      *f = p;
+      *data_used = (int) (f->stream - data);
+      *error = 0;
+      return f;
+   } else {
+      vorbis_deinit(&p);
+      return NULL;
+   }
+}
+#endif // STB_VORBIS_NO_PUSHDATA_API
+
+unsigned int stb_vorbis_get_file_offset(stb_vorbis *f)
+{
+   #ifndef STB_VORBIS_NO_PUSHDATA_API
+   if (f->push_mode) return 0;
+   #endif
+   if (USE_MEMORY(f)) return (unsigned int) (f->stream - f->stream_start);
+   #ifndef STB_VORBIS_NO_STDIO
+   return (unsigned int) (ftell(f->f) - f->f_start);
+   #endif
+}
+
+#ifndef STB_VORBIS_NO_PULLDATA_API
+//
+// DATA-PULLING API
+//
+
+static uint32 vorbis_find_page(stb_vorbis *f, uint32 *end, uint32 *last)
+{
+   for(;;) {
+      int n;
+      if (f->eof) return 0;
+      n = get8(f);
+      if (n == 0x4f) { // page header candidate
+         unsigned int retry_loc = stb_vorbis_get_file_offset(f);
+         int i;
+         // check if we're off the end of a file_section stream
+         if (retry_loc - 25 > f->stream_len)
+            return 0;
+         // check the rest of the header
+         for (i=1; i < 4; ++i)
+            if (get8(f) != ogg_page_header[i])
+               break;
+         if (f->eof) return 0;
+         if (i == 4) {
+            uint8 header[27];
+            uint32 i, crc, goal, len;
+            for (i=0; i < 4; ++i)
+               header[i] = ogg_page_header[i];
+            for (; i < 27; ++i)
+               header[i] = get8(f);
+            if (f->eof) return 0;
+            if (header[4] != 0) goto invalid;
+            goal = header[22] + (header[23] << 8) + (header[24]<<16) + ((uint32)header[25]<<24);
+            for (i=22; i < 26; ++i)
+               header[i] = 0;
+            crc = 0;
+            for (i=0; i < 27; ++i)
+               crc = crc32_update(crc, header[i]);
+            len = 0;
+            for (i=0; i < header[26]; ++i) {
+               int s = get8(f);
+               crc = crc32_update(crc, s);
+               len += s;
+            }
+            if (len && f->eof) return 0;
+            for (i=0; i < len; ++i)
+               crc = crc32_update(crc, get8(f));
+            // finished parsing probable page
+            if (crc == goal) {
+               // we could now check that it's either got the last
+               // page flag set, OR it's followed by the capture
+               // pattern, but I guess TECHNICALLY you could have
+               // a file with garbage between each ogg page and recover
+               // from it automatically? So even though that paranoia
+               // might decrease the chance of an invalid decode by
+               // another 2^32, not worth it since it would hose those
+               // invalid-but-useful files?
+               if (end)
+                  *end = stb_vorbis_get_file_offset(f);
+               if (last) {
+                  if (header[5] & 0x04)
+                     *last = 1;
+                  else
+                     *last = 0;
+               }
+               set_file_offset(f, retry_loc-1);
+               return 1;
+            }
+         }
+        invalid:
+         // not a valid page, so rewind and look for next one
+         set_file_offset(f, retry_loc);
+      }
+   }
+}
+
+
+#define SAMPLE_unknown  0xffffffff
+
+// seeking is implemented with a binary search, which narrows down the range to
+// 64K, before using a linear search (because finding the synchronization
+// pattern can be expensive, and the chance we'd find the end page again is
+// relatively high for small ranges)
+//
+// two initial interpolation-style probes are used at the start of the search
+// to try to bound either side of the binary search sensibly, while still
+// working in O(log n) time if they fail.
+
+static int get_seek_page_info(stb_vorbis *f, ProbedPage *z)
+{
+   uint8 header[27], lacing[255];
+   int i,len;
+
+   // record where the page starts
+   z->page_start = stb_vorbis_get_file_offset(f);
+
+   // parse the header
+   getn(f, header, 27);
+   if (header[0] != 'O' || header[1] != 'g' || header[2] != 'g' || header[3] != 'S')
+      return 0;
+   getn(f, lacing, header[26]);
+
+   // determine the length of the payload
+   len = 0;
+   for (i=0; i < header[26]; ++i)
+      len += lacing[i];
+
+   // this implies where the page ends
+   z->page_end = z->page_start + 27 + header[26] + len;
+
+   // read the last-decoded sample out of the data
+   z->last_decoded_sample = header[6] + (header[7] << 8) + (header[8] << 16) + (header[9] << 24);
+
+   // restore file state to where we were
+   set_file_offset(f, z->page_start);
+   return 1;
+}
+
+// rarely used function to seek back to the preceding page while finding the
+// start of a packet
+static int go_to_page_before(stb_vorbis *f, unsigned int limit_offset)
+{
+   unsigned int previous_safe, end;
+
+   // now we want to seek back 64K from the limit
+   if (limit_offset >= 65536 && limit_offset-65536 >= f->first_audio_page_offset)
+      previous_safe = limit_offset - 65536;
+   else
+      previous_safe = f->first_audio_page_offset;
+
+   set_file_offset(f, previous_safe);
+
+   while (vorbis_find_page(f, &end, NULL)) {
+      if (end >= limit_offset && stb_vorbis_get_file_offset(f) < limit_offset)
+         return 1;
+      set_file_offset(f, end);
+   }
+
+   return 0;
+}
+
+// implements the search logic for finding a page and starting decoding. if
+// the function succeeds, current_loc_valid will be true and current_loc will
+// be less than or equal to the provided sample number (the closer the
+// better).
+static int seek_to_sample_coarse(stb_vorbis *f, uint32 sample_number)
+{
+   ProbedPage left, right, mid;
+   int i, start_seg_with_known_loc, end_pos, page_start;
+   uint32 delta, stream_length, padding, last_sample_limit;
+   double offset = 0.0, bytes_per_sample = 0.0;
+   int probe = 0;
+
+   // find the last page and validate the target sample
+   stream_length = stb_vorbis_stream_length_in_samples(f);
+   if (stream_length == 0)            return error(f, VORBIS_seek_without_length);
+   if (sample_number > stream_length) return error(f, VORBIS_seek_invalid);
+
+   // this is the maximum difference between the window-center (which is the
+   // actual granule position value), and the right-start (which the spec
+   // indicates should be the granule position (give or take one)).
+   padding = ((f->blocksize_1 - f->blocksize_0) >> 2);
+   if (sample_number < padding)
+      last_sample_limit = 0;
+   else
+      last_sample_limit = sample_number - padding;
+
+   left = f->p_first;
+   while (left.last_decoded_sample == ~0U) {
+      // (untested) the first page does not have a 'last_decoded_sample'
+      set_file_offset(f, left.page_end);
+      if (!get_seek_page_info(f, &left)) goto error;
+   }
+
+   right = f->p_last;
+   assert(right.last_decoded_sample != ~0U);
+
+   // starting from the start is handled differently
+   if (last_sample_limit <= left.last_decoded_sample) {
+      if (stb_vorbis_seek_start(f)) {
+         if (f->current_loc > sample_number)
+            return error(f, VORBIS_seek_failed);
+         return 1;
+      }
+      return 0;
+   }
+
+   while (left.page_end != right.page_start) {
+      assert(left.page_end < right.page_start);
+      // search range in bytes
+      delta = right.page_start - left.page_end;
+      if (delta <= 65536) {
+         // there's only 64K left to search - handle it linearly
+         set_file_offset(f, left.page_end);
+      } else {
+         if (probe < 2) {
+            if (probe == 0) {
+               // first probe (interpolate)
+               double data_bytes = right.page_end - left.page_start;
+               bytes_per_sample = data_bytes / right.last_decoded_sample;
+               offset = left.page_start + bytes_per_sample * (last_sample_limit - left.last_decoded_sample);
+            } else {
+               // second probe (try to bound the other side)
+               double error = ((double) last_sample_limit - mid.last_decoded_sample) * bytes_per_sample;
+               if (error >= 0 && error <  8000) error =  8000;
+               if (error <  0 && error > -8000) error = -8000;
+               offset += error * 2;
+            }
+
+            // ensure the offset is valid
+            if (offset < left.page_end)
+               offset = left.page_end;
+            if (offset > right.page_start - 65536)
+               offset = right.page_start - 65536;
+
+            set_file_offset(f, (unsigned int) offset);
+         } else {
+            // binary search for large ranges (offset by 32K to ensure
+            // we don't hit the right page)
+            set_file_offset(f, left.page_end + (delta / 2) - 32768);
+         }
+
+         if (!vorbis_find_page(f, NULL, NULL)) goto error;
+      }
+
+      for (;;) {
+         if (!get_seek_page_info(f, &mid)) goto error;
+         if (mid.last_decoded_sample != ~0U) break;
+         // (untested) no frames end on this page
+         set_file_offset(f, mid.page_end);
+         assert(mid.page_start < right.page_start);
+      }
+
+      // if we've just found the last page again then we're in a tricky file,
+      // and we're close enough (if it wasn't an interpolation probe).
+      if (mid.page_start == right.page_start) {
+         if (probe >= 2 || delta <= 65536)
+            break;
+      } else {
+         if (last_sample_limit < mid.last_decoded_sample)
+            right = mid;
+         else
+            left = mid;
+      }
+
+      ++probe;
+   }
+
+   // seek back to start of the last packet
+   page_start = left.page_start;
+   set_file_offset(f, page_start);
+   if (!start_page(f)) return error(f, VORBIS_seek_failed);
+   end_pos = f->end_seg_with_known_loc;
+   assert(end_pos >= 0);
+
+   for (;;) {
+      for (i = end_pos; i > 0; --i)
+         if (f->segments[i-1] != 255)
+            break;
+
+      start_seg_with_known_loc = i;
+
+      if (start_seg_with_known_loc > 0 || !(f->page_flag & PAGEFLAG_continued_packet))
+         break;
+
+      // (untested) the final packet begins on an earlier page
+      if (!go_to_page_before(f, page_start))
+         goto error;
+
+      page_start = stb_vorbis_get_file_offset(f);
+      if (!start_page(f)) goto error;
+      end_pos = f->segment_count - 1;
+   }
+
+   // prepare to start decoding
+   f->current_loc_valid = FALSE;
+   f->last_seg = FALSE;
+   f->valid_bits = 0;
+   f->packet_bytes = 0;
+   f->bytes_in_seg = 0;
+   f->previous_length = 0;
+   f->next_seg = start_seg_with_known_loc;
+
+   for (i = 0; i < start_seg_with_known_loc; i++)
+      skip(f, f->segments[i]);
+
+   // start decoding (optimizable - this frame is generally discarded)
+   if (!vorbis_pump_first_frame(f))
+      return 0;
+   if (f->current_loc > sample_number)
+      return error(f, VORBIS_seek_failed);
+   return 1;
+
+error:
+   // try to restore the file to a valid state
+   stb_vorbis_seek_start(f);
+   return error(f, VORBIS_seek_failed);
+}
+
+// the same as vorbis_decode_initial, but without advancing
+static int peek_decode_initial(vorb *f, int *p_left_start, int *p_left_end, int *p_right_start, int *p_right_end, int *mode)
+{
+   int bits_read, bytes_read;
+
+   if (!vorbis_decode_initial(f, p_left_start, p_left_end, p_right_start, p_right_end, mode))
+      return 0;
+
+   // either 1 or 2 bytes were read, figure out which so we can rewind
+   bits_read = 1 + ilog(f->mode_count-1);
+   if (f->mode_config[*mode].blockflag)
+      bits_read += 2;
+   bytes_read = (bits_read + 7) / 8;
+
+   f->bytes_in_seg += bytes_read;
+   f->packet_bytes -= bytes_read;
+   skip(f, -bytes_read);
+   if (f->next_seg == -1)
+      f->next_seg = f->segment_count - 1;
+   else
+      f->next_seg--;
+   f->valid_bits = 0;
+
+   return 1;
+}
+
+int stb_vorbis_seek_frame(stb_vorbis *f, unsigned int sample_number)
+{
+   uint32 max_frame_samples;
+
+   if (IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+
+   // fast page-level search
+   if (!seek_to_sample_coarse(f, sample_number))
+      return 0;
+
+   assert(f->current_loc_valid);
+   assert(f->current_loc <= sample_number);
+
+   // linear search for the relevant packet
+   max_frame_samples = (f->blocksize_1*3 - f->blocksize_0) >> 2;
+   while (f->current_loc < sample_number) {
+      int left_start, left_end, right_start, right_end, mode, frame_samples;
+      if (!peek_decode_initial(f, &left_start, &left_end, &right_start, &right_end, &mode))
+         return error(f, VORBIS_seek_failed);
+      // calculate the number of samples returned by the next frame
+      frame_samples = right_start - left_start;
+      if (f->current_loc + frame_samples > sample_number) {
+         return 1; // the next frame will contain the sample
+      } else if (f->current_loc + frame_samples + max_frame_samples > sample_number) {
+         // there's a chance the frame after this could contain the sample
+         vorbis_pump_first_frame(f);
+      } else {
+         // this frame is too early to be relevant
+         f->current_loc += frame_samples;
+         f->previous_length = 0;
+         maybe_start_packet(f);
+         flush_packet(f);
+      }
+   }
+   // the next frame should start with the sample
+   if (f->current_loc != sample_number) return error(f, VORBIS_seek_failed);
+   return 1;
+}
+
+int stb_vorbis_seek(stb_vorbis *f, unsigned int sample_number)
+{
+   if (!stb_vorbis_seek_frame(f, sample_number))
+      return 0;
+
+   if (sample_number != f->current_loc) {
+      int n;
+      uint32 frame_start = f->current_loc;
+      stb_vorbis_get_frame_float(f, &n, NULL);
+      assert(sample_number > frame_start);
+      assert(f->channel_buffer_start + (int) (sample_number-frame_start) <= f->channel_buffer_end);
+      f->channel_buffer_start += (sample_number - frame_start);
+   }
+
+   return 1;
+}
+
+int stb_vorbis_seek_start(stb_vorbis *f)
+{
+   if (IS_PUSH_MODE(f)) { return error(f, VORBIS_invalid_api_mixing); }
+   set_file_offset(f, f->first_audio_page_offset);
+   f->previous_length = 0;
+   f->first_decode = TRUE;
+   f->next_seg = -1;
+   return vorbis_pump_first_frame(f);
+}
+
+unsigned int stb_vorbis_stream_length_in_samples(stb_vorbis *f)
+{
+   unsigned int restore_offset, previous_safe;
+   unsigned int end, last_page_loc;
+
+   if (IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+   if (!f->total_samples) {
+      unsigned int last;
+      uint32 lo,hi;
+      char header[6];
+
+      // first, store the current decode position so we can restore it
+      restore_offset = stb_vorbis_get_file_offset(f);
+
+      // now we want to seek back 64K from the end (the last page must
+      // be at most a little less than 64K, but let's allow a little slop)
+      if (f->stream_len >= 65536 && f->stream_len-65536 >= f->first_audio_page_offset)
+         previous_safe = f->stream_len - 65536;
+      else
+         previous_safe = f->first_audio_page_offset;
+
+      set_file_offset(f, previous_safe);
+      // previous_safe is now our candidate 'earliest known place that seeking
+      // to will lead to the final page'
+
+      if (!vorbis_find_page(f, &end, &last)) {
+         // if we can't find a page, we're hosed!
+         f->error = VORBIS_cant_find_last_page;
+         f->total_samples = 0xffffffff;
+         goto done;
+      }
+
+      // check if there are more pages
+      last_page_loc = stb_vorbis_get_file_offset(f);
+
+      // stop when the last_page flag is set, not when we reach eof;
+      // this allows us to stop short of a 'file_section' end without
+      // explicitly checking the length of the section
+      while (!last) {
+         set_file_offset(f, end);
+         if (!vorbis_find_page(f, &end, &last)) {
+            // the last page we found didn't have the 'last page' flag
+            // set. whoops!
+            break;
+         }
+         //previous_safe = last_page_loc+1; // NOTE: not used after this point, but note for debugging
+         last_page_loc = stb_vorbis_get_file_offset(f);
+      }
+
+      set_file_offset(f, last_page_loc);
+
+      // parse the header
+      getn(f, (unsigned char *)header, 6);
+      // extract the absolute granule position
+      lo = get32(f);
+      hi = get32(f);
+      if (lo == 0xffffffff && hi == 0xffffffff) {
+         f->error = VORBIS_cant_find_last_page;
+         f->total_samples = SAMPLE_unknown;
+         goto done;
+      }
+      if (hi)
+         lo = 0xfffffffe; // saturate
+      f->total_samples = lo;
+
+      f->p_last.page_start = last_page_loc;
+      f->p_last.page_end   = end;
+      f->p_last.last_decoded_sample = lo;
+
+     done:
+      set_file_offset(f, restore_offset);
+   }
+   return f->total_samples == SAMPLE_unknown ? 0 : f->total_samples;
+}
+
+float stb_vorbis_stream_length_in_seconds(stb_vorbis *f)
+{
+   return stb_vorbis_stream_length_in_samples(f) / (float) f->sample_rate;
+}
+
+
+
+int stb_vorbis_get_frame_float(stb_vorbis *f, int *channels, float ***output)
+{
+   int len, right,left,i;
+   if (IS_PUSH_MODE(f)) return error(f, VORBIS_invalid_api_mixing);
+
+   if (!vorbis_decode_packet(f, &len, &left, &right)) {
+      f->channel_buffer_start = f->channel_buffer_end = 0;
+      return 0;
+   }
+
+   len = vorbis_finish_frame(f, len, left, right);
+   for (i=0; i < f->channels; ++i)
+      f->outputs[i] = f->channel_buffers[i] + left;
+
+   f->channel_buffer_start = left;
+   f->channel_buffer_end   = left+len;
+
+   if (channels) *channels = f->channels;
+   if (output)   *output = f->outputs;
+   return len;
+}
+
+#ifndef STB_VORBIS_NO_STDIO
+
+stb_vorbis * stb_vorbis_open_file_section(FILE *file, int close_on_free, int *error, const stb_vorbis_alloc *alloc, unsigned int length)
+{
+   stb_vorbis *f, p;
+   vorbis_init(&p, alloc);
+   p.f = file;
+   p.f_start = (uint32) ftell(file);
+   p.stream_len   = length;
+   p.close_on_free = close_on_free;
+   if (start_decoder(&p)) {
+      f = vorbis_alloc(&p);
+      if (f) {
+         *f = p;
+         vorbis_pump_first_frame(f);
+         return f;
+      }
+   }
+   if (error) *error = p.error;
+   vorbis_deinit(&p);
+   return NULL;
+}
+
+stb_vorbis * stb_vorbis_open_file(FILE *file, int close_on_free, int *error, const stb_vorbis_alloc *alloc)
+{
+   unsigned int len, start;
+   start = (unsigned int) ftell(file);
+   fseek(file, 0, SEEK_END);
+   len = (unsigned int) (ftell(file) - start);
+   fseek(file, start, SEEK_SET);
+   return stb_vorbis_open_file_section(file, close_on_free, error, alloc, len);
+}
+
+stb_vorbis * stb_vorbis_open_filename(const char *filename, int *error, const stb_vorbis_alloc *alloc)
+{
+   FILE *f;
+#if defined(_WIN32) && defined(__STDC_WANT_SECURE_LIB__)
+   if (0 != fopen_s(&f, filename, "rb"))
+      f = NULL;
+#else
+   f = fopen(filename, "rb");
+#endif
+   if (f)
+      return stb_vorbis_open_file(f, TRUE, error, alloc);
+   if (error) *error = VORBIS_file_open_failure;
+   return NULL;
+}
+#endif // STB_VORBIS_NO_STDIO
+
+stb_vorbis * stb_vorbis_open_memory(const unsigned char *data, int len, int *error, const stb_vorbis_alloc *alloc)
+{
+   stb_vorbis *f, p;
+   if (!data) {
+      if (error) *error = VORBIS_unexpected_eof;
+      return NULL;
+   }
+   vorbis_init(&p, alloc);
+   p.stream = (uint8 *) data;
+   p.stream_end = (uint8 *) data + len;
+   p.stream_start = (uint8 *) p.stream;
+   p.stream_len = len;
+   p.push_mode = FALSE;
+   if (start_decoder(&p)) {
+      f = vorbis_alloc(&p);
+      if (f) {
+         *f = p;
+         vorbis_pump_first_frame(f);
+         if (error) *error = VORBIS__no_error;
+         return f;
+      }
+   }
+   if (error) *error = p.error;
+   vorbis_deinit(&p);
+   return NULL;
+}
+
+#ifndef STB_VORBIS_NO_INTEGER_CONVERSION
+#define PLAYBACK_MONO     1
+#define PLAYBACK_LEFT     2
+#define PLAYBACK_RIGHT    4
+
+#define L  (PLAYBACK_LEFT  | PLAYBACK_MONO)
+#define C  (PLAYBACK_LEFT  | PLAYBACK_RIGHT | PLAYBACK_MONO)
+#define R  (PLAYBACK_RIGHT | PLAYBACK_MONO)
+
+static int8 channel_position[7][6] =
+{
+   { 0 },
+   { C },
+   { L, R },
+   { L, C, R },
+   { L, R, L, R },
+   { L, C, R, L, R },
+   { L, C, R, L, R, C },
+};
+
+
+#ifndef STB_VORBIS_NO_FAST_SCALED_FLOAT
+   typedef union {
+      float f;
+      int i;
+   } float_conv;
+   typedef char stb_vorbis_float_size_test[sizeof(float)==4 && sizeof(int) == 4];
+   #define FASTDEF(x) float_conv x
+   // add (1<<23) to convert to int, then divide by 2^SHIFT, then add 0.5/2^SHIFT to round
+   #define MAGIC(SHIFT) (1.5f * (1 << (23-SHIFT)) + 0.5f/(1 << SHIFT))
+   #define ADDEND(SHIFT) (((150-SHIFT) << 23) + (1 << 22))
+   #define FAST_SCALED_FLOAT_TO_INT(temp,x,s) (temp.f = (x) + MAGIC(s), temp.i - ADDEND(s))
+   #define check_endianness()
+#else
+   #define FAST_SCALED_FLOAT_TO_INT(temp,x,s) ((int) ((x) * (1 << (s))))
+   #define check_endianness()
+   #define FASTDEF(x)
+#endif
+
+static void copy_samples(short *dest, float *src, int len)
+{
+   int i;
+   check_endianness();
+   for (i=0; i < len; ++i) {
+      FASTDEF(temp);
+      int v = FAST_SCALED_FLOAT_TO_INT(temp, src[i],15);
+      if ((unsigned int) (v + 32768) > 65535)
+         v = v < 0 ? -32768 : 32767;
+      dest[i] = v;
+   }
+}
+
+static void compute_samples(int mask, short *output, int num_c, float **data, int d_offset, int len)
+{
+   #define STB_BUFFER_SIZE  32
+   float buffer[STB_BUFFER_SIZE];
+   int i,j,o,n = STB_BUFFER_SIZE;
+   check_endianness();
+   for (o = 0; o < len; o += STB_BUFFER_SIZE) {
+      memset(buffer, 0, sizeof(buffer));
+      if (o + n > len) n = len - o;
+      for (j=0; j < num_c; ++j) {
+         if (channel_position[num_c][j] & mask) {
+            for (i=0; i < n; ++i)
+               buffer[i] += data[j][d_offset+o+i];
+         }
+      }
+      for (i=0; i < n; ++i) {
+         FASTDEF(temp);
+         int v = FAST_SCALED_FLOAT_TO_INT(temp,buffer[i],15);
+         if ((unsigned int) (v + 32768) > 65535)
+            v = v < 0 ? -32768 : 32767;
+         output[o+i] = v;
+      }
+   }
+   #undef STB_BUFFER_SIZE
+}
+
+static void compute_stereo_samples(short *output, int num_c, float **data, int d_offset, int len)
+{
+   #define STB_BUFFER_SIZE  32
+   float buffer[STB_BUFFER_SIZE];
+   int i,j,o,n = STB_BUFFER_SIZE >> 1;
+   // o is the offset in the source data
+   check_endianness();
+   for (o = 0; o < len; o += STB_BUFFER_SIZE >> 1) {
+      // o2 is the offset in the output data
+      int o2 = o << 1;
+      memset(buffer, 0, sizeof(buffer));
+      if (o + n > len) n = len - o;
+      for (j=0; j < num_c; ++j) {
+         int m = channel_position[num_c][j] & (PLAYBACK_LEFT | PLAYBACK_RIGHT);
+         if (m == (PLAYBACK_LEFT | PLAYBACK_RIGHT)) {
+            for (i=0; i < n; ++i) {
+               buffer[i*2+0] += data[j][d_offset+o+i];
+               buffer[i*2+1] += data[j][d_offset+o+i];
+            }
+         } else if (m == PLAYBACK_LEFT) {
+            for (i=0; i < n; ++i) {
+               buffer[i*2+0] += data[j][d_offset+o+i];
+            }
+         } else if (m == PLAYBACK_RIGHT) {
+            for (i=0; i < n; ++i) {
+               buffer[i*2+1] += data[j][d_offset+o+i];
+            }
+         }
+      }
+      for (i=0; i < (n<<1); ++i) {
+         FASTDEF(temp);
+         int v = FAST_SCALED_FLOAT_TO_INT(temp,buffer[i],15);
+         if ((unsigned int) (v + 32768) > 65535)
+            v = v < 0 ? -32768 : 32767;
+         output[o2+i] = v;
+      }
+   }
+   #undef STB_BUFFER_SIZE
+}
+
+static void convert_samples_short(int buf_c, short **buffer, int b_offset, int data_c, float **data, int d_offset, int samples)
+{
+   int i;
+   if (buf_c != data_c && buf_c <= 2 && data_c <= 6) {
+      static int channel_selector[3][2] = { {0}, {PLAYBACK_MONO}, {PLAYBACK_LEFT, PLAYBACK_RIGHT} };
+      for (i=0; i < buf_c; ++i)
+         compute_samples(channel_selector[buf_c][i], buffer[i]+b_offset, data_c, data, d_offset, samples);
+   } else {
+      int limit = buf_c < data_c ? buf_c : data_c;
+      for (i=0; i < limit; ++i)
+         copy_samples(buffer[i]+b_offset, data[i]+d_offset, samples);
+      for (   ; i < buf_c; ++i)
+         memset(buffer[i]+b_offset, 0, sizeof(short) * samples);
+   }
+}
+
+int stb_vorbis_get_frame_short(stb_vorbis *f, int num_c, short **buffer, int num_samples)
+{
+   float **output = NULL;
+   int len = stb_vorbis_get_frame_float(f, NULL, &output);
+   if (len > num_samples) len = num_samples;
+   if (len)
+      convert_samples_short(num_c, buffer, 0, f->channels, output, 0, len);
+   return len;
+}
+
+static void convert_channels_short_interleaved(int buf_c, short *buffer, int data_c, float **data, int d_offset, int len)
+{
+   int i;
+   check_endianness();
+   if (buf_c != data_c && buf_c <= 2 && data_c <= 6) {
+      assert(buf_c == 2);
+      for (i=0; i < buf_c; ++i)
+         compute_stereo_samples(buffer, data_c, data, d_offset, len);
+   } else {
+      int limit = buf_c < data_c ? buf_c : data_c;
+      int j;
+      for (j=0; j < len; ++j) {
+         for (i=0; i < limit; ++i) {
+            FASTDEF(temp);
+            float f = data[i][d_offset+j];
+            int v = FAST_SCALED_FLOAT_TO_INT(temp, f,15);//data[i][d_offset+j],15);
+            if ((unsigned int) (v + 32768) > 65535)
+               v = v < 0 ? -32768 : 32767;
+            *buffer++ = v;
+         }
+         for (   ; i < buf_c; ++i)
+            *buffer++ = 0;
+      }
+   }
+}
+
+int stb_vorbis_get_frame_short_interleaved(stb_vorbis *f, int num_c, short *buffer, int num_shorts)
+{
+   float **output;
+   int len;
+   if (num_c == 1) return stb_vorbis_get_frame_short(f,num_c,&buffer, num_shorts);
+   len = stb_vorbis_get_frame_float(f, NULL, &output);
+   if (len) {
+      if (len*num_c > num_shorts) len = num_shorts / num_c;
+      convert_channels_short_interleaved(num_c, buffer, f->channels, output, 0, len);
+   }
+   return len;
+}
+
+int stb_vorbis_get_samples_short_interleaved(stb_vorbis *f, int channels, short *buffer, int num_shorts)
+{
+   float **outputs;
+   int len = num_shorts / channels;
+   int n=0;
+   while (n < len) {
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= len) k = len - n;
+      if (k)
+         convert_channels_short_interleaved(channels, buffer, f->channels, f->channel_buffers, f->channel_buffer_start, k);
+      buffer += k*channels;
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == len) break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs)) break;
+   }
+   return n;
+}
+
+int stb_vorbis_get_samples_short(stb_vorbis *f, int channels, short **buffer, int len)
+{
+   float **outputs;
+   int n=0;
+   while (n < len) {
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= len) k = len - n;
+      if (k)
+         convert_samples_short(channels, buffer, n, f->channels, f->channel_buffers, f->channel_buffer_start, k);
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == len) break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs)) break;
+   }
+   return n;
+}
+
+#ifndef STB_VORBIS_NO_STDIO
+int stb_vorbis_decode_filename(const char *filename, int *channels, int *sample_rate, short **output)
+{
+   int data_len, offset, total, limit, error;
+   short *data;
+   stb_vorbis *v = stb_vorbis_open_filename(filename, &error, NULL);
+   if (v == NULL) return -1;
+   limit = v->channels * 4096;
+   *channels = v->channels;
+   if (sample_rate)
+      *sample_rate = v->sample_rate;
+   offset = data_len = 0;
+   total = limit;
+   data = (short *) malloc(total * sizeof(*data));
+   if (data == NULL) {
+      stb_vorbis_close(v);
+      return -2;
+   }
+   for (;;) {
+      int n = stb_vorbis_get_frame_short_interleaved(v, v->channels, data+offset, total-offset);
+      if (n == 0) break;
+      data_len += n;
+      offset += n * v->channels;
+      if (offset + limit > total) {
+         short *data2;
+         total *= 2;
+         data2 = (short *) realloc(data, total * sizeof(*data));
+         if (data2 == NULL) {
+            free(data);
+            stb_vorbis_close(v);
+            return -2;
+         }
+         data = data2;
+      }
+   }
+   *output = data;
+   stb_vorbis_close(v);
+   return data_len;
+}
+#endif // NO_STDIO
+
+int stb_vorbis_decode_memory(const uint8 *mem, int len, int *channels, int *sample_rate, short **output)
+{
+   int data_len, offset, total, limit, error;
+   short *data;
+   stb_vorbis *v = stb_vorbis_open_memory(mem, len, &error, NULL);
+   if (v == NULL) return -1;
+   limit = v->channels * 4096;
+   *channels = v->channels;
+   if (sample_rate)
+      *sample_rate = v->sample_rate;
+   offset = data_len = 0;
+   total = limit;
+   data = (short *) malloc(total * sizeof(*data));
+   if (data == NULL) {
+      stb_vorbis_close(v);
+      return -2;
+   }
+   for (;;) {
+      int n = stb_vorbis_get_frame_short_interleaved(v, v->channels, data+offset, total-offset);
+      if (n == 0) break;
+      data_len += n;
+      offset += n * v->channels;
+      if (offset + limit > total) {
+         short *data2;
+         total *= 2;
+         data2 = (short *) realloc(data, total * sizeof(*data));
+         if (data2 == NULL) {
+            free(data);
+            stb_vorbis_close(v);
+            return -2;
+         }
+         data = data2;
+      }
+   }
+   *output = data;
+   stb_vorbis_close(v);
+   return data_len;
+}
+#endif // STB_VORBIS_NO_INTEGER_CONVERSION
+
+int stb_vorbis_get_samples_float_interleaved(stb_vorbis *f, int channels, float *buffer, int num_floats)
+{
+   float **outputs;
+   int len = num_floats / channels;
+   int n=0;
+   int z = f->channels;
+   if (z > channels) z = channels;
+   while (n < len) {
+      int i,j;
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= len) k = len - n;
+      for (j=0; j < k; ++j) {
+         for (i=0; i < z; ++i)
+            *buffer++ = f->channel_buffers[i][f->channel_buffer_start+j];
+         for (   ; i < channels; ++i)
+            *buffer++ = 0;
+      }
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == len)
+         break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs))
+         break;
+   }
+   return n;
+}
+
+int stb_vorbis_get_samples_float(stb_vorbis *f, int channels, float **buffer, int num_samples)
+{
+   float **outputs;
+   int n=0;
+   int z = f->channels;
+   if (z > channels) z = channels;
+   while (n < num_samples) {
+      int i;
+      int k = f->channel_buffer_end - f->channel_buffer_start;
+      if (n+k >= num_samples) k = num_samples - n;
+      if (k) {
+         for (i=0; i < z; ++i)
+            memcpy(buffer[i]+n, f->channel_buffers[i]+f->channel_buffer_start, sizeof(float)*k);
+         for (   ; i < channels; ++i)
+            memset(buffer[i]+n, 0, sizeof(float) * k);
+      }
+      n += k;
+      f->channel_buffer_start += k;
+      if (n == num_samples)
+         break;
+      if (!stb_vorbis_get_frame_float(f, NULL, &outputs))
+         break;
+   }
+   return n;
+}
+#endif // STB_VORBIS_NO_PULLDATA_API
+
+/* Version history
+    1.17    - 2019-07-08 - fix CVE-2019-13217, -13218, -13219, -13220, -13221, -13222, -13223
+                           found with Mayhem by ForAllSecure
+    1.16    - 2019-03-04 - fix warnings
+    1.15    - 2019-02-07 - explicit failure if Ogg Skeleton data is found
+    1.14    - 2018-02-11 - delete bogus dealloca usage
+    1.13    - 2018-01-29 - fix truncation of last frame (hopefully)
+    1.12    - 2017-11-21 - limit residue begin/end to blocksize/2 to avoid large temp allocs in bad/corrupt files
+    1.11    - 2017-07-23 - fix MinGW compilation
+    1.10    - 2017-03-03 - more robust seeking; fix negative ilog(); clear error in open_memory
+    1.09    - 2016-04-04 - back out 'avoid discarding last frame' fix from previous version
+    1.08    - 2016-04-02 - fixed multiple warnings; fix setup memory leaks;
+                           avoid discarding last frame of audio data
+    1.07    - 2015-01-16 - fixed some warnings, fix mingw, const-correct API
+                           some more crash fixes when out of memory or with corrupt files
+    1.06    - 2015-08-31 - full, correct support for seeking API (Dougall Johnson)
+                           some crash fixes when out of memory or with corrupt files
+    1.05    - 2015-04-19 - don't define __forceinline if it's redundant
+    1.04    - 2014-08-27 - fix missing const-correct case in API
+    1.03    - 2014-08-07 - Warning fixes
+    1.02    - 2014-07-09 - Declare qsort compare function _cdecl on windows
+    1.01    - 2014-06-18 - fix stb_vorbis_get_samples_float
+    1.0     - 2014-05-26 - fix memory leaks; fix warnings; fix bugs in multichannel
+                           (API change) report sample rate for decode-full-file funcs
+    0.99996 - bracket #include <malloc.h> for macintosh compilation by Laurent Gomila
+    0.99995 - use union instead of pointer-cast for fast-float-to-int to avoid alias-optimization problem
+    0.99994 - change fast-float-to-int to work in single-precision FPU mode, remove endian-dependence
+    0.99993 - remove assert that fired on legal files with empty tables
+    0.99992 - rewind-to-start
+    0.99991 - bugfix to stb_vorbis_get_samples_short by Bernhard Wodo
+    0.9999 - (should have been 0.99990) fix no-CRT support, compiling as C++
+    0.9998 - add a full-decode function with a memory source
+    0.9997 - fix a bug in the read-from-FILE case in 0.9996 addition
+    0.9996 - query length of vorbis stream in samples/seconds
+    0.9995 - bugfix to another optimization that only happened in certain files
+    0.9994 - bugfix to one of the optimizations that caused significant (but inaudible?) errors
+    0.9993 - performance improvements; runs in 99% to 104% of time of reference implementation
+    0.9992 - performance improvement of IMDCT; now performs close to reference implementation
+    0.9991 - performance improvement of IMDCT
+    0.999 - (should have been 0.9990) performance improvement of IMDCT
+    0.998 - no-CRT support from Casey Muratori
+    0.997 - bugfixes for bugs found by Terje Mathisen
+    0.996 - bugfix: fast-huffman decode initialized incorrectly for sparse codebooks; fixing gives 10% speedup - found by Terje Mathisen
+    0.995 - bugfix: fix to 'effective' overrun detection - found by Terje Mathisen
+    0.994 - bugfix: garbage decode on final VQ symbol of a non-multiple - found by Terje Mathisen
+    0.993 - bugfix: pushdata API required 1 extra byte for empty page (failed to consume final page if empty) - found by Terje Mathisen
+    0.992 - fixes for MinGW warning
+    0.991 - turn fast-float-conversion on by default
+    0.990 - fix push-mode seek recovery if you seek into the headers
+    0.98b - fix to bad release of 0.98
+    0.98 - fix push-mode seek recovery; robustify float-to-int and support non-fast mode
+    0.97 - builds under c++ (typecasting, don't use 'class' keyword)
+    0.96 - somehow MY 0.95 was right, but the web one was wrong, so here's my 0.95 rereleased as 0.96, fixes a typo in the clamping code
+    0.95 - clamping code for 16-bit functions
+    0.94 - not publically released
+    0.93 - fixed all-zero-floor case (was decoding garbage)
+    0.92 - fixed a memory leak
+    0.91 - conditional compiles to omit parts of the API and the infrastructure to support them: STB_VORBIS_NO_PULLDATA_API, STB_VORBIS_NO_PUSHDATA_API, STB_VORBIS_NO_STDIO, STB_VORBIS_NO_INTEGER_CONVERSION
+    0.90 - first public release
+*/
+
+#endif // STB_VORBIS_HEADER_ONLY
+
+
+/*
+------------------------------------------------------------------------------
+This software is available under 2 licenses -- choose whichever you prefer.
+------------------------------------------------------------------------------
+ALTERNATIVE A - MIT License
+Copyright (c) 2017 Sean Barrett
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
+of the Software, and to permit persons to whom the Software is furnished to do
+so, subject to the following conditions:
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+------------------------------------------------------------------------------
+ALTERNATIVE B - Public Domain (www.unlicense.org)
+This is free and unencumbered software released into the public domain.
+Anyone is free to copy, modify, publish, use, compile, sell, or distribute this
+software, either in source code form or as a compiled binary, for any purpose,
+commercial or non-commercial, and by any means.
+In jurisdictions that recognize copyright laws, the author or authors of this
+software dedicate any and all copyright interest in the software to the public
+domain. We make this dedication for the benefit of the public at large and to
+the detriment of our heirs and successors. We intend this dedication to be an
+overt act of relinquishment in perpetuity of all present and future rights to
+this software under copyright law.
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN
+ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION
+WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+------------------------------------------------------------------------------
+*/
diff --git a/neo/sys/aros/aros_dos.cpp b/neo/sys/aros/aros_dos.cpp
index ed135cb..bcdd833 100644
--- a/neo/sys/aros/aros_dos.cpp
+++ b/neo/sys/aros/aros_dos.cpp
@@ -238,6 +238,10 @@ char *Sys_GetClipboardData(void) {
     return NULL;
 }
 
+void Sys_FreeClipboardData( char* data ) {
+	// as Sys_GetClipboardData() returns a static buffer, there's nothing to free
+}
+
 void Sys_SetClipboardData( const char *string ) {
     struct	IFFHandle	*IFFHandle;
     BOOL	written = FALSE;
@@ -471,10 +475,27 @@ void AROS_OpenURL( const char *url ) {
     URL_OpenA( (char *)url, tags );
 }
 
+bool AROS_GetSavePath(char buf[1024])
+{
+	static const size_t bufSize = 1024; // NOTE: keep in sync with caller/function sig!
+	BPTR pathlock;
+	bool ret = false;
+	if ((pathlock = Lock("PROGDIR:", SHARED_LOCK)) != BNULL)
+	{
+		if ( NameFromLock( pathlock, buf, bufSize ) )
+		{
+			D(bug("[ADoom3] Sys_GetPath: using '%s'\n", buf));
+			ret = true;
+		}
+		UnLock(pathlock);
+	}
+	return ret;
+}
 
 bool Sys_GetPath(sysPath_t type, idStr &path) {
     char buf[1024];
     BPTR pathlock;
+    bool ret = false;
 
     D(bug("[ADoom3] Sys_GetPath(%d)\n", type));
 
@@ -484,16 +505,11 @@ bool Sys_GetPath(sysPath_t type, idStr &path) {
     case PATH_BASE:
     case PATH_CONFIG:
     case PATH_SAVE:
-            if ((pathlock = Lock("PROGDIR:", SHARED_LOCK)) != BNULL)
-            {
-                if ( NameFromLock( pathlock, buf, sizeof( buf ) ) )
-                {
-                    D(bug("[ADoom3] Sys_GetPath: using '%s'\n", buf));
-                    path = buf;
-                }
-                UnLock(pathlock);
+            if(AROS_GetSavePath(buf)) {
+                path = buf;
+                ret = true;
             }
-            return true;
+            break;
 
     case PATH_EXE:
             if ((pathlock = Lock("PROGDIR:", SHARED_LOCK)) != BNULL)
@@ -506,11 +522,12 @@ bool Sys_GetPath(sysPath_t type, idStr &path) {
 
                     D(bug("[ADoom3] Sys_GetPath: using '%s'\n", buf));
                     path = buf;
+                    ret = true;
                 }
                 UnLock(pathlock);
             }
-            return true;
+            break;
     }
 
-    return false;
+    return ret;
 }
diff --git a/neo/sys/aros/aros_main.cpp b/neo/sys/aros/aros_main.cpp
index cdaa872..afad94f 100644
--- a/neo/sys/aros/aros_main.cpp
+++ b/neo/sys/aros/aros_main.cpp
@@ -92,6 +92,8 @@ idCVar in_tty( "in_tty", "1", CVAR_BOOL | CVAR_INIT | CVAR_SYSTEM, "terminal tab
 static bool				tty_enabled = false;
 static struct termios	tty_tc;
 
+static FILE* consoleLog = NULL;
+
 // pid - useful when you attach to gdb..
 idCVar com_pid( "com_pid", "0", CVAR_INTEGER | CVAR_INIT | CVAR_SYSTEM, "process id" );
 
@@ -119,6 +121,11 @@ void AROS_Exit(int ret) {
     // at this point, too late to catch signals
     AROS_ClearSigs();
 
+    if( consoleLog != NULL ) {
+        fclose(consoleLog);
+        consoleLog = NULL;
+    }
+
     // process spawning. it's best when it happens after everything has shut down
     if ( exit_spawn[0] ) {
         Sys_DoStartProcess( exit_spawn, false );
@@ -309,6 +316,35 @@ void Sys_SetPhysicalWorkMemory( int minBytes, int maxBytes ) {
     common->DPrintf( "TODO: Sys_SetPhysicalWorkMemory\n" );
 }
 
+extern bool AROS_GetSavePath(char buf[1024]);
+
+static void initLog()
+{
+	char logPath[1024];
+	if(!AROS_GetSavePath(logPath))
+		return;
+
+	// TODO: create savePath directory if it doesn't exist..
+
+	char logBkPath[1024];
+	strcpy(logBkPath, logPath);
+	idStr::Append(logBkPath, sizeof(logBkPath), PATHSEPERATOR_STR "dhewm3log-old.txt");
+	idStr::Append(logPath, sizeof(logPath), PATHSEPERATOR_STR "dhewm3log.txt");
+
+	rename(logPath, logBkPath); // I hope AROS supports this, but it's standard C89 so it should
+
+	consoleLog = fopen(logPath, "w");
+	if(consoleLog == NULL) {
+		printf("WARNING: Couldn't open/create '%s', error was: %d (%s)\n", logPath, errno, strerror(errno));
+	} else {
+		time_t tt = time(NULL);
+		const struct tm* tms = localtime(&tt);
+		char timeStr[64] = {};
+		strftime(timeStr, sizeof(timeStr), "%F %H:%M:%S", tms);
+		fprintf(consoleLog, "Opened this log at %s\n", timeStr);
+	}
+}
+
 /*
 ===============
 AROS_EarlyInit
@@ -317,9 +353,13 @@ AROS_EarlyInit
 void AROS_EarlyInit( void ) {
     bug("[ADoom3] %s()\n", __PRETTY_FUNCTION__);
 
+    initLog();
+
     exit_spawn[0] = '\0';
     AROS_InitLibs();
     AROS_InitSigs();
+
+    // TODO: logfile
 }
 
 /*
@@ -736,19 +776,36 @@ low level output
 ===============
 */
 
+void Sys_VPrintf(const char *msg, va_list arg) {
+	// gonna use arg twice, so copy it
+	va_list arg2;
+	va_copy(arg2, arg);
+
+	// first print to stdout()
+	vprintf(msg, arg2);
+
+	va_end(arg2); // arg2 is not needed anymore
+
+	// then print to the log, if any
+	if(consoleLog != NULL)
+	{
+		vfprintf(consoleLog, msg, arg);
+	}
+}
+
 void Sys_DebugPrintf( const char *fmt, ... ) {
     va_list argptr;
 
     tty_Hide();
     va_start( argptr, fmt );
-    vprintf( fmt, argptr );
+    Sys_VPrintf( fmt, argptr );
     va_end( argptr );
     tty_Show();
 }
 
 void Sys_DebugVPrintf( const char *fmt, va_list arg ) {
     tty_Hide();
-    vprintf( fmt, arg );
+    Sys_VPrintf( fmt, arg );
     tty_Show();
 }
 
@@ -757,17 +814,11 @@ void Sys_Printf(const char *msg, ...) {
 
     tty_Hide();
     va_start( argptr, msg );
-    vprintf( msg, argptr );
+    Sys_VPrintf( msg, argptr );
     va_end( argptr );
     tty_Show();
 }
 
-void Sys_VPrintf(const char *msg, va_list arg) {
-    tty_Hide();
-    vprintf(msg, arg);
-    tty_Show();
-}
-
 /*
 ================
 Sys_Error
diff --git a/neo/sys/aros/aros_net.cpp b/neo/sys/aros/aros_net.cpp
index e61bc0a..6e5e99f 100644
--- a/neo/sys/aros/aros_net.cpp
+++ b/neo/sys/aros/aros_net.cpp
@@ -117,8 +117,7 @@ ExtractPort
 */
 static bool ExtractPort( const char *src, char *buf, int bufsize, int *port ) {
 	char *p;
-	strncpy( buf, src, bufsize );
-	p = buf; p += Min( bufsize - 1, (int)strlen( src ) ); *p = '\0';
+	idStr::Copynz( buf, src, bufsize );
 	p = strchr( buf, ':' );
 	if ( !p ) {
 		return false;
diff --git a/neo/sys/aros/aros_signal.cpp b/neo/sys/aros/aros_signal.cpp
index f8626f4..4271c72 100644
--- a/neo/sys/aros/aros_signal.cpp
+++ b/neo/sys/aros/aros_signal.cpp
@@ -170,5 +170,5 @@ Sys_SetFatalError
 ==================
 */
 void Sys_SetFatalError( const char *error ) {
-	strncpy( fatalError, error, sizeof( fatalError ) );
+	idStr::Copynz( fatalError, error, sizeof( fatalError ) );
 }
diff --git a/neo/sys/cmake/FindOGG.cmake b/neo/sys/cmake/FindOGG.cmake
deleted file mode 100644
index b62ac80..0000000
--- a/neo/sys/cmake/FindOGG.cmake
+++ /dev/null
@@ -1,83 +0,0 @@
-# Locate OGG
-# This module defines XXX_FOUND, XXX_INCLUDE_DIRS and XXX_LIBRARIES standard variables
-#
-# $OGGDIR is an environment variable that would
-# correspond to the ./configure --prefix=$OGGDIR
-# used in building OGG.
-
-SET(OGG_SEARCH_PATHS
-	~/Library/Frameworks
-	/Library/Frameworks
-	/usr/local
-	/usr
-	/sw # Fink
-	/opt/local # DarwinPorts
-	/opt/csw # Blastwave
-	/opt
-)
-
-SET(MSVC_YEAR_NAME)
-IF (MSVC_VERSION GREATER 1599)		# >= 1600
-	SET(MSVC_YEAR_NAME VS2010)
-ELSEIF(MSVC_VERSION GREATER 1499)	# >= 1500
-	SET(MSVC_YEAR_NAME VS2008)
-ELSEIF(MSVC_VERSION GREATER 1399)	# >= 1400
-	SET(MSVC_YEAR_NAME VS2005)
-ELSEIF(MSVC_VERSION GREATER 1299)	# >= 1300
-	SET(MSVC_YEAR_NAME VS2003)
-ELSEIF(MSVC_VERSION GREATER 1199)	# >= 1200
-	SET(MSVC_YEAR_NAME VS6)
-ENDIF()
-
-FIND_PATH(OGG_INCLUDE_DIR
-	NAMES ogg/ogg.h ogg/os_types.h
-	HINTS
-	$ENV{OGGDIR}
-	$ENV{OGG_PATH}
-	PATH_SUFFIXES include
-	PATHS ${OGG_SEARCH_PATHS}
-)
-
-FIND_LIBRARY(OGG_LIBRARY
-	NAMES ogg libogg
-	HINTS
-	$ENV{OGGDIR}
-	$ENV{OGG_PATH}
-	PATH_SUFFIXES lib lib64 win32/Dynamic_Release "Win32/${MSVC_YEAR_NAME}/x64/Release" "Win32/${MSVC_YEAR_NAME}/Win32/Release"
-	PATHS ${OGG_SEARCH_PATHS}
-)
-
-# First search for d-suffixed libs
-FIND_LIBRARY(OGG_LIBRARY_DEBUG
-	NAMES oggd ogg_d liboggd libogg_d
-	HINTS
-	$ENV{OGGDIR}
-	$ENV{OGG_PATH}
-	PATH_SUFFIXES lib lib64 win32/Dynamic_Debug "Win32/${MSVC_YEAR_NAME}/x64/Debug" "Win32/${MSVC_YEAR_NAME}/Win32/Debug"
-	PATHS ${OGG_SEARCH_PATHS}
-)
-
-IF(NOT OGG_LIBRARY_DEBUG)
-	# Then search for non suffixed libs if necessary, but only in debug dirs
-	FIND_LIBRARY(OGG_LIBRARY_DEBUG
-		NAMES ogg libogg
-		HINTS
-		$ENV{OGGDIR}
-		$ENV{OGG_PATH}
-		PATH_SUFFIXES win32/Dynamic_Debug "Win32/${MSVC_YEAR_NAME}/x64/Debug" "Win32/${MSVC_YEAR_NAME}/Win32/Debug"
-		PATHS ${OGG_SEARCH_PATHS}
-	)
-ENDIF()
-
-
-IF(OGG_LIBRARY)
-	IF(OGG_LIBRARY_DEBUG)
-		SET(OGG_LIBRARIES optimized "${OGG_LIBRARY}" debug "${OGG_LIBRARY_DEBUG}")
-	ELSE()
-		SET(OGG_LIBRARIES "${OGG_LIBRARY}")		# Could add "general" keyword, but it is optional
-	ENDIF()
-ENDIF()
-
-# handle the QUIETLY and REQUIRED arguments and set XXX_FOUND to TRUE if all listed variables are TRUE
-INCLUDE(FindPackageHandleStandardArgs)
-FIND_PACKAGE_HANDLE_STANDARD_ARGS(OGG DEFAULT_MSG OGG_LIBRARIES OGG_INCLUDE_DIR)
diff --git a/neo/sys/cmake/FindVorbis.cmake b/neo/sys/cmake/FindVorbis.cmake
deleted file mode 100644
index b83c13e..0000000
--- a/neo/sys/cmake/FindVorbis.cmake
+++ /dev/null
@@ -1,83 +0,0 @@
-# Locate Vorbis
-# This module defines XXX_FOUND, XXX_INCLUDE_DIRS and XXX_LIBRARIES standard variables
-#
-# $VORBISDIR is an environment variable that would
-# correspond to the ./configure --prefix=$VORBISDIR
-# used in building Vorbis.
-
-SET(VORBIS_SEARCH_PATHS
-	~/Library/Frameworks
-	/Library/Frameworks
-	/usr/local
-	/usr
-	/sw # Fink
-	/opt/local # DarwinPorts
-	/opt/csw # Blastwave
-	/opt
-)
-
-SET(MSVC_YEAR_NAME)
-IF (MSVC_VERSION GREATER 1599)		# >= 1600
-	SET(MSVC_YEAR_NAME VS2010)
-ELSEIF(MSVC_VERSION GREATER 1499)	# >= 1500
-	SET(MSVC_YEAR_NAME VS2008)
-ELSEIF(MSVC_VERSION GREATER 1399)	# >= 1400
-	SET(MSVC_YEAR_NAME VS2005)
-ELSEIF(MSVC_VERSION GREATER 1299)	# >= 1300
-	SET(MSVC_YEAR_NAME VS2003)
-ELSEIF(MSVC_VERSION GREATER 1199)	# >= 1200
-	SET(MSVC_YEAR_NAME VS6)
-ENDIF()
-
-FIND_PATH(VORBIS_INCLUDE_DIR
-	NAMES vorbis/codec.h
-	HINTS
-	$ENV{VORBISDIR}
-	$ENV{VORBIS_PATH}
-	PATH_SUFFIXES include
-	PATHS ${VORBIS_SEARCH_PATHS}
-)
-
-FIND_LIBRARY(VORBIS_LIBRARY
-	NAMES vorbis libvorbis
-	HINTS
-	$ENV{VORBISDIR}
-	$ENV{VORBIS_PATH}
-	PATH_SUFFIXES lib lib64 win32/Vorbis_Dynamic_Release "Win32/${MSVC_YEAR_NAME}/x64/Release" "Win32/${MSVC_YEAR_NAME}/Win32/Release"
-	PATHS ${VORBIS_SEARCH_PATHS}
-)
-
-# First search for d-suffixed libs
-FIND_LIBRARY(VORBIS_LIBRARY_DEBUG
-	NAMES vorbisd vorbis_d libvorbisd libvorbis_d
-	HINTS
-	$ENV{VORBISDIR}
-	$ENV{VORBIS_PATH}
-	PATH_SUFFIXES lib lib64 win32/Vorbis_Dynamic_Debug "Win32/${MSVC_YEAR_NAME}/x64/Debug" "Win32/${MSVC_YEAR_NAME}/Win32/Debug"
-	PATHS ${VORBIS_SEARCH_PATHS}
-)
-
-IF(NOT VORBIS_LIBRARY_DEBUG)
-	# Then search for non suffixed libs if necessary, but only in debug dirs
-	FIND_LIBRARY(VORBIS_LIBRARY_DEBUG
-		NAMES vorbis libvorbis
-		HINTS
-		$ENV{VORBISDIR}
-		$ENV{VORBIS_PATH}
-		PATH_SUFFIXES win32/Vorbis_Dynamic_Debug "Win32/${MSVC_YEAR_NAME}/x64/Debug" "Win32/${MSVC_YEAR_NAME}/Win32/Debug"
-		PATHS ${VORBIS_SEARCH_PATHS}
-	)
-ENDIF()
-
-
-IF(VORBIS_LIBRARY)
-	IF(VORBIS_LIBRARY_DEBUG)
-		SET(VORBIS_LIBRARIES optimized "${VORBIS_LIBRARY}" debug "${VORBIS_LIBRARY_DEBUG}")
-	ELSE()
-		SET(VORBIS_LIBRARIES "${VORBIS_LIBRARY}")		# Could add "general" keyword, but it is optional
-	ENDIF()
-ENDIF()
-
-# handle the QUIETLY and REQUIRED arguments and set XXX_FOUND to TRUE if all listed variables are TRUE
-INCLUDE(FindPackageHandleStandardArgs)
-FIND_PACKAGE_HANDLE_STANDARD_ARGS(VORBIS DEFAULT_MSG VORBIS_LIBRARIES VORBIS_INCLUDE_DIR)
diff --git a/neo/sys/cmake/FindVorbisFile.cmake b/neo/sys/cmake/FindVorbisFile.cmake
deleted file mode 100644
index 09a0264..0000000
--- a/neo/sys/cmake/FindVorbisFile.cmake
+++ /dev/null
@@ -1,91 +0,0 @@
-# Locate VorbisFile
-# This module defines XXX_FOUND, XXX_INCLUDE_DIRS and XXX_LIBRARIES standard variables
-#
-# $VORBISDIR is an environment variable that would
-# correspond to the ./configure --prefix=$VORBISDIR
-# used in building Vorbis.
-
-SET(VORBISFILE_SEARCH_PATHS
-	~/Library/Frameworks
-	/Library/Frameworks
-    /usr/local
-	/usr
-	/sw # Fink
-	/opt/local # DarwinPorts
-	/opt/csw # Blastwave
-	/opt
-)
-
-SET(MSVC_YEAR_NAME)
-IF (MSVC_VERSION GREATER 1599)		# >= 1600
-	SET(MSVC_YEAR_NAME VS2010)
-ELSEIF(MSVC_VERSION GREATER 1499)	# >= 1500
-	SET(MSVC_YEAR_NAME VS2008)
-ELSEIF(MSVC_VERSION GREATER 1399)	# >= 1400
-	SET(MSVC_YEAR_NAME VS2005)
-ELSEIF(MSVC_VERSION GREATER 1299)	# >= 1300
-	SET(MSVC_YEAR_NAME VS2003)
-ELSEIF(MSVC_VERSION GREATER 1199)	# >= 1200
-	SET(MSVC_YEAR_NAME VS6)
-ENDIF()
-
-FIND_PATH(VORBISFILE_INCLUDE_DIR
-	NAMES vorbis/vorbisfile.h
-	HINTS
-	$ENV{VORBISFILEDIR}
-	$ENV{VORBISFILE_PATH}
-	$ENV{VORBISDIR}
-	$ENV{VORBIS_PATH}
-	PATH_SUFFIXES include
-	PATHS ${VORBISFILE_SEARCH_PATHS}
-)
-
-FIND_LIBRARY(VORBISFILE_LIBRARY
-	NAMES vorbisfile libvorbisfile
-	HINTS
-	$ENV{VORBISFILEDIR}
-	$ENV{VORBISFILE_PATH}
-	$ENV{VORBISDIR}
-	$ENV{VORBIS_PATH}
-	PATH_SUFFIXES lib lib64 win32/VorbisFile_Dynamic_Release "Win32/${MSVC_YEAR_NAME}/x64/Release" "Win32/${MSVC_YEAR_NAME}/Win32/Release"
-	PATHS ${VORBISFILE_SEARCH_PATHS}
-)
-
-# First search for d-suffixed libs
-FIND_LIBRARY(VORBISFILE_LIBRARY_DEBUG
-	NAMES vorbisfiled vorbisfile_d libvorbisfiled libvorbisfile_d
-	HINTS
-	$ENV{VORBISFILEDIR}
-	$ENV{VORBISFILE_PATH}
-	$ENV{VORBISDIR}
-	$ENV{VORBIS_PATH}
-	PATH_SUFFIXES lib lib64 win32/VorbisFile_Dynamic_Debug "Win32/${MSVC_YEAR_NAME}/x64/Debug" "Win32/${MSVC_YEAR_NAME}/Win32/Debug"
-	PATHS ${VORBISFILE_SEARCH_PATHS}
-)
-
-IF(NOT VORBISFILE_LIBRARY_DEBUG)
-	# Then search for non suffixed libs if necessary, but only in debug dirs
-	FIND_LIBRARY(VORBISFILE_LIBRARY_DEBUG
-		NAMES vorbisfile libvorbisfile
-		HINTS
-		$ENV{VORBISFILEDIR}
-		$ENV{VORBISFILE_PATH}
-		$ENV{VORBISDIR}
-		$ENV{VORBIS_PATH}
-		PATH_SUFFIXES win32/VorbisFile_Dynamic_Debug "Win32/${MSVC_YEAR_NAME}/x64/Debug" "Win32/${MSVC_YEAR_NAME}/Win32/Debug"
-		PATHS ${VORBISFILE_SEARCH_PATHS}
-	)
-ENDIF()
-
-
-IF(VORBISFILE_LIBRARY)
-	IF(VORBISFILE_LIBRARY_DEBUG)
-		SET(VORBISFILE_LIBRARIES optimized "${VORBISFILE_LIBRARY}" debug "${VORBISFILE_LIBRARY_DEBUG}")
-	ELSE()
-		SET(VORBISFILE_LIBRARIES "${VORBISFILE_LIBRARY}")		# Could add "general" keyword, but it is optional
-	ENDIF()
-ENDIF()
-
-# handle the QUIETLY and REQUIRED arguments and set XXX_FOUND to TRUE if all listed variables are TRUE
-INCLUDE(FindPackageHandleStandardArgs)
-FIND_PACKAGE_HANDLE_STANDARD_ARGS(VORBISFILE DEFAULT_MSG VORBISFILE_LIBRARIES VORBISFILE_INCLUDE_DIR)
diff --git a/neo/sys/events.cpp b/neo/sys/events.cpp
index 8abd0b0..6cc0451 100644
--- a/neo/sys/events.cpp
+++ b/neo/sys/events.cpp
@@ -32,8 +32,9 @@ If you have questions concerning this license or the applicable additional terms
 #include "idlib/containers/List.h"
 #include "idlib/Heap.h"
 #include "framework/Common.h"
+#include "framework/Console.h"
 #include "framework/KeyInput.h"
-#include "framework/Session.h"
+#include "framework/Session_local.h"
 #include "renderer/RenderSystem.h"
 #include "renderer/tr_local.h"
 
@@ -59,11 +60,27 @@ If you have questions concerning this license or the applicable additional terms
 #define SDLK_PRINTSCREEN SDLK_PRINT
 #endif
 
-const char *kbdNames[] = {
+// NOTE: g++-4.7 doesn't like when this is static (for idCmdSystem::ArgCompletion_String<kbdNames>)
+const char *_in_kbdNames[] = {
+#if SDL_VERSION_ATLEAST(2, 0, 0) // auto-detection is only available for SDL2
+	"auto",
+#endif
 	"english", "french", "german", "italian", "spanish", "turkish", "norwegian", "brazilian", NULL
 };
 
-idCVar in_kbd("in_kbd", "english", CVAR_SYSTEM | CVAR_ARCHIVE | CVAR_NOCHEAT, "keyboard layout", kbdNames, idCmdSystem::ArgCompletion_String<kbdNames> );
+static idCVar in_kbd("in_kbd", _in_kbdNames[0], CVAR_SYSTEM | CVAR_ARCHIVE | CVAR_NOCHEAT, "keyboard layout", _in_kbdNames, idCmdSystem::ArgCompletion_String<_in_kbdNames> );
+// TODO: I'd really like to make in_ignoreConsoleKey default to 1, but I guess there would be too much confusion :-/
+static idCVar in_ignoreConsoleKey("in_ignoreConsoleKey", "0", CVAR_SYSTEM | CVAR_ARCHIVE | CVAR_NOCHEAT | CVAR_BOOL,
+		"Console only opens with Shift+Esc, not ` or ^ etc");
+
+static idCVar in_nograb("in_nograb", "0", CVAR_SYSTEM | CVAR_NOCHEAT, "prevents input grabbing");
+static idCVar in_grabKeyboard("in_grabKeyboard", "0", CVAR_SYSTEM | CVAR_ARCHIVE | CVAR_NOCHEAT | CVAR_BOOL,
+		"if enabled, grabs all keyboard input if mouse is grabbed (so keyboard shortcuts from the OS like Alt-Tab or Windows Key won't work)");
+
+// set in handleMouseGrab(), used in Sys_GetEvent() to decide what kind of internal mouse event to generate
+static bool in_relativeMouseMode = true;
+// set in Sys_GetEvent() on window focus gained/lost events
+static bool in_hasFocus = true;
 
 struct kbd_poll_t {
 	int key;
@@ -94,6 +111,204 @@ struct mouse_poll_t {
 static idList<kbd_poll_t> kbd_polls;
 static idList<mouse_poll_t> mouse_polls;
 
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+// for utf8ToISO8859_1() - used for non-ascii text input and Sys_GetLocalizedScancodeName()
+static SDL_iconv_t iconvDesc = (SDL_iconv_t)-1;
+#endif
+
+struct scancodename_t {
+	int sdlScancode;
+	const char* name;
+};
+
+// scancodenames[keynum - K_FIRST_SCANCODE] belongs to keynum
+static scancodename_t scancodemappings[] = {
+	// NOTE: must be kept in sync with the K_SC_* section of keyNum_t in framework/KeyInput.h !
+
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	#define D3_SC_MAPPING(X) { SDL_SCANCODE_ ## X , "SC_" #X }
+#else // SDL1.2 doesn't have scancodes
+	#define D3_SC_MAPPING(X) { 0 , "SC_" #X }
+#endif
+
+	D3_SC_MAPPING(A), // { SDL_SCANCODE_A, "SC_A" },
+	D3_SC_MAPPING(B),
+	D3_SC_MAPPING(C),
+	D3_SC_MAPPING(D),
+	D3_SC_MAPPING(E),
+	D3_SC_MAPPING(F),
+	D3_SC_MAPPING(G),
+	D3_SC_MAPPING(H),
+	D3_SC_MAPPING(I),
+	D3_SC_MAPPING(J),
+	D3_SC_MAPPING(K),
+	D3_SC_MAPPING(L),
+	D3_SC_MAPPING(M),
+	D3_SC_MAPPING(N),
+	D3_SC_MAPPING(O),
+	D3_SC_MAPPING(P),
+	D3_SC_MAPPING(Q),
+	D3_SC_MAPPING(R),
+	D3_SC_MAPPING(S),
+	D3_SC_MAPPING(T),
+	D3_SC_MAPPING(U),
+	D3_SC_MAPPING(V),
+	D3_SC_MAPPING(W),
+	D3_SC_MAPPING(X),
+	D3_SC_MAPPING(Y),
+	D3_SC_MAPPING(Z),
+	// leaving out SDL_SCANCODE_1 ... _0, we handle them separately already
+	// also return, escape, backspace, tab, space, already handled as keycodes
+	D3_SC_MAPPING(MINUS),
+	D3_SC_MAPPING(EQUALS),
+	D3_SC_MAPPING(LEFTBRACKET),
+	D3_SC_MAPPING(RIGHTBRACKET),
+	D3_SC_MAPPING(BACKSLASH),
+	D3_SC_MAPPING(NONUSHASH),
+	D3_SC_MAPPING(SEMICOLON),
+	D3_SC_MAPPING(APOSTROPHE),
+	D3_SC_MAPPING(GRAVE),
+	D3_SC_MAPPING(COMMA),
+	D3_SC_MAPPING(PERIOD),
+	D3_SC_MAPPING(SLASH),
+	// leaving out lots of key incl. from keypad, we already handle them as normal keys
+	D3_SC_MAPPING(NONUSBACKSLASH),
+	D3_SC_MAPPING(INTERNATIONAL1), /**< used on Asian keyboards, see footnotes in USB doc */
+	D3_SC_MAPPING(INTERNATIONAL2),
+	D3_SC_MAPPING(INTERNATIONAL3), /**< Yen */
+	D3_SC_MAPPING(INTERNATIONAL4),
+	D3_SC_MAPPING(INTERNATIONAL5),
+	D3_SC_MAPPING(INTERNATIONAL6),
+	D3_SC_MAPPING(INTERNATIONAL7),
+	D3_SC_MAPPING(INTERNATIONAL8),
+	D3_SC_MAPPING(INTERNATIONAL9),
+	D3_SC_MAPPING(THOUSANDSSEPARATOR),
+	D3_SC_MAPPING(DECIMALSEPARATOR),
+	D3_SC_MAPPING(CURRENCYUNIT),
+	D3_SC_MAPPING(CURRENCYSUBUNIT)
+
+#undef D3_SC_MAPPING
+};
+
+// for keynums between K_FIRST_SCANCODE and K_LAST_SCANCODE
+// returns e.g. "SC_A" for K_SC_A
+const char* Sys_GetScancodeName( int key ) {
+	if ( key >= K_FIRST_SCANCODE && key <= K_LAST_SCANCODE ) {
+		int scIdx = key - K_FIRST_SCANCODE;
+		return scancodemappings[scIdx].name;
+	}
+	return NULL;
+}
+
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+static bool isAscii( const char* str_ ) {
+	const unsigned char* str = (const unsigned char*)str_;
+	while(*str != '\0') {
+		if(*str > 127) {
+			return false;
+		}
+		++str;
+	}
+	return true;
+}
+
+// convert inbuf (which is expected to be in UTF-8) to outbuf (in ISO-8859-1)
+static bool utf8ToISO8859_1(const char* inbuf, char* outbuf, size_t outsize) {
+	if ( iconvDesc == (SDL_iconv_t)-1 ) {
+		return false;
+	}
+
+	size_t outbytesleft = outsize;
+	size_t inbytesleft = strlen( inbuf ) + 1; // + terminating \0
+	size_t ret = SDL_iconv( iconvDesc, &inbuf, &inbytesleft, &outbuf, &outbytesleft );
+
+	while(inbytesleft > 0) {
+		switch ( ret ) {
+			case SDL_ICONV_E2BIG:
+				outbuf[outbytesleft-1] = '\0'; // whatever, just cut it off..
+				common->DPrintf( "Cutting off UTF-8 to ISO-8859-1 conversion to '%s' because destination is too small for '%s'\n", outbuf, inbuf );
+				SDL_iconv( iconvDesc, NULL, NULL, NULL, NULL ); // reset descriptor for next conversion
+				return true;
+			case SDL_ICONV_EILSEQ:
+				// try skipping invalid input data
+				++inbuf;
+				--inbytesleft;
+				break;
+			case SDL_ICONV_EINVAL:
+			case SDL_ICONV_ERROR:
+				// we can't recover from this
+				SDL_iconv( iconvDesc, NULL, NULL, NULL, NULL ); // reset descriptor for next conversion
+				return false;
+		}
+	}
+	SDL_iconv( iconvDesc, NULL, NULL, NULL, NULL ); // reset descriptor for next conversion
+	return outbytesleft < outsize; // return false if no char was written
+}
+#endif // SDL2
+
+// returns localized name of the key (between K_FIRST_SCANCODE and K_LAST_SCANCODE),
+// regarding the current keyboard layout - if that name is in ASCII or corresponds
+// to a "High-ASCII" char supported by Doom3.
+// Otherwise return same name as Sys_GetScancodeName()
+// !! Returned string is only valid until next call to this function !!
+const char* Sys_GetLocalizedScancodeName( int key ) {
+	if ( key >= K_FIRST_SCANCODE && key <= K_LAST_SCANCODE ) {
+		int scIdx = key - K_FIRST_SCANCODE;
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+		SDL_Scancode sc = ( SDL_Scancode ) scancodemappings[scIdx].sdlScancode;
+		SDL_Keycode k = SDL_GetKeyFromScancode( sc );
+		if ( k >= 0xA1 && k <= 0xFF ) {
+			// luckily, the "High-ASCII" (ISO-8559-1) chars supported by Doom3
+			// have the same values as the corresponding SDL_Keycodes.
+			static char oneCharStr[2] = {0, 0};
+			oneCharStr[0] = (unsigned char)k;
+			return oneCharStr;
+		} else if ( k != SDLK_UNKNOWN ) {
+			const char *ret = SDL_GetKeyName( k );
+			// the keyname from SDL2 is in UTF-8, which Doom3 can't print,
+			// so only return the name if it's ASCII, otherwise fall back to "SC_bla"
+			if ( ret && *ret != '\0' ) {
+				if( isAscii( ret ) ) {
+					return ret;
+				}
+				static char isoName[32];
+				// try to convert name to ISO8859-1 (Doom3's supported "High ASCII")
+				if ( utf8ToISO8859_1( ret, isoName, sizeof(isoName) ) && isoName[0] != '\0' ) {
+					return isoName;
+				}
+			}
+		}
+#endif  // SDL1.2 doesn't support this, use unlocalized name (also as fallback if we couldn't get a keyname)
+		return scancodemappings[scIdx].name;
+
+	}
+	return NULL;
+}
+
+// returns keyNum_t (K_SC_* constant) for given scancode name (like "SC_A")
+// only makes sense to call it if name starts with "SC_" (or "sc_")
+// returns -1 if not found
+int Sys_GetKeynumForScancodeName( const char* name ) {
+	for( int scIdx = 0; scIdx < K_NUM_SCANCODES; ++scIdx ) {
+		if ( idStr::Icmp( name, scancodemappings[scIdx].name ) == 0 ) {
+			return scIdx + K_FIRST_SCANCODE;
+		}
+	}
+	return -1;
+}
+
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+static int getKeynumForSDLscancode( SDL_Scancode scancode ) {
+	int sc = scancode;
+	for ( int scIdx=0; scIdx < K_NUM_SCANCODES; ++scIdx ) {
+		if ( scancodemappings[scIdx].sdlScancode == sc ) {
+			return scIdx + K_FIRST_SCANCODE;
+		}
+	}
+	return 0;
+}
+#endif
+
 static byte mapkey(SDL_Keycode key) {
 	switch (key) {
 	case SDLK_BACKSPACE:
@@ -132,12 +347,15 @@ static byte mapkey(SDL_Keycode key) {
 		return K_MENU;
 
 	case SDLK_LALT:
-	case SDLK_RALT:
 		return K_ALT;
+	case SDLK_RALT:
+		return K_RIGHT_ALT;
 	case SDLK_RCTRL:
+		return K_RIGHT_CTRL;
 	case SDLK_LCTRL:
 		return K_CTRL;
 	case SDLK_RSHIFT:
+		return K_RIGHT_SHIFT;
 	case SDLK_LSHIFT:
 		return K_SHIFT;
 	case SDLK_INSERT:
@@ -252,6 +470,7 @@ static byte mapkey(SDL_Keycode key) {
 	case SDLK_PRINTSCREEN:
 		return K_PRINT_SCR;
 	case SDLK_MODE:
+		// FIXME: is this really right alt? (also mapping SDLK_RALT to K_RIGHT_ALT)
 		return K_RIGHT_ALT;
 	}
 
@@ -285,12 +504,33 @@ void Sys_InitInput() {
 	kbd_polls.SetGranularity(64);
 	mouse_polls.SetGranularity(64);
 
+	assert(sizeof(scancodemappings)/sizeof(scancodemappings[0]) == K_NUM_SCANCODES && "scancodemappings incomplete?");
+
 #if !SDL_VERSION_ATLEAST(2, 0, 0)
 	SDL_EnableUNICODE(1);
 	SDL_EnableKeyRepeat(SDL_DEFAULT_REPEAT_DELAY, SDL_DEFAULT_REPEAT_INTERVAL);
+
+#else // SDL2 - for utf8ToISO8859_1() (non-ascii text input and key naming)
+	assert(iconvDesc == (SDL_iconv_t)-1);
+	iconvDesc = SDL_iconv_open( "ISO-8859-1", "UTF-8" );
+	if( iconvDesc == (SDL_iconv_t)-1 ) {
+		common->Warning( "Sys_SetInput(): iconv_open( \"ISO-8859-1\", \"UTF-8\" ) failed! Can't translate non-ascii input!\n" );
+	}
 #endif
 
 	in_kbd.SetModified();
+	Sys_GetConsoleKey(false); // initialize consoleKeymappingIdx from in_kbd
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	const char* grabKeyboardEnv = SDL_getenv(SDL_HINT_GRAB_KEYBOARD);
+	if ( grabKeyboardEnv ) {
+		common->Printf( "The SDL_GRAB_KEYBOARD environment variable is set, setting the in_grabKeyboard CVar to the same value (%s)\n", grabKeyboardEnv );
+		in_grabKeyboard.SetString( grabKeyboardEnv );
+	} else {
+		in_grabKeyboard.SetModified();
+	}
+#else // SDL1.2 doesn't support this
+	in_grabKeyboard.ClearModified();
+#endif
 }
 
 /*
@@ -301,6 +541,10 @@ Sys_ShutdownInput
 void Sys_ShutdownInput() {
 	kbd_polls.Clear();
 	mouse_polls.Clear();
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	SDL_iconv_close( iconvDesc ); // used by utf8ToISO8859_1()
+	iconvDesc = ( SDL_iconv_t ) -1; 
+#endif
 }
 
 /*
@@ -314,46 +558,89 @@ void Sys_InitScanTable() {
 }
 #endif
 
+
+struct ConsoleKeyMapping {
+	const char* langName;
+	unsigned char key;
+	unsigned char keyShifted;
+};
+
+static ConsoleKeyMapping consoleKeyMappings[] = {
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	{ "auto",   	 0 ,	0   }, // special case: set current keycode for SDL_SCANCODE_GRAVE (no shifted keycode, though)
+#endif
+	{ "english",	'`',	'~' },
+	{ "french", 	'<',	'>' },
+	{ "german", 	'^',	176 }, // °
+	{ "italian",	'\\',	'|' },
+	{ "spanish",	186,	170 }, // º ª
+	{ "turkish",	'"',	233 }, // é
+	{ "norwegian",	124,	167 }, // | §
+	{ "brazilian",	'\'',	'"' },
+};
+static int consoleKeyMappingIdx = 0;
+
+static void initConsoleKeyMapping() {
+	const int numMappings = sizeof(consoleKeyMappings)/sizeof(consoleKeyMappings[0]);
+
+	idStr lang = in_kbd.GetString();
+	consoleKeyMappingIdx = 0;
+
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	consoleKeyMappings[0].key = 0;
+	if ( lang.Length() == 0 || lang.Icmp( "auto") == 0 ) {
+		// auto-detection (SDL2-only)
+		int keycode = SDL_GetKeyFromScancode( SDL_SCANCODE_GRAVE );
+		if ( keycode > 0 && keycode <= 0xFF ) {
+			// the SDL keycode and dhewm3 keycode should be identical for the mappings,
+			// as it's ISO-8859-1 ("High ASCII") chars
+			for( int i=1; i<numMappings; ++i ) {
+				if ( consoleKeyMappings[i].key == keycode ) {
+					consoleKeyMappingIdx = i;
+					common->Printf( "Detected keyboard layout as \"%s\"\n", consoleKeyMappings[i].langName );
+					break;
+				}
+			}
+			if ( consoleKeyMappingIdx == 0 ) { // not found in known mappings
+				consoleKeyMappings[0].key = keycode;
+			}
+		}
+	} else
+#endif
+	{
+		for( int i=1; i<numMappings; ++i ) {
+			if( lang.Icmp( consoleKeyMappings[i].langName ) == 0 ) {
+				consoleKeyMappingIdx = i;
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+				int keycode = SDL_GetKeyFromScancode( SDL_SCANCODE_GRAVE );
+				if ( keycode && keycode != consoleKeyMappings[i].key ) {
+					common->Warning( "in_kbd is set to \"%s\", but the actual keycode of the 'console key' is %c (%d), not %c (%d), so this might not work that well..\n",
+							lang.c_str(), (unsigned char)keycode, keycode, consoleKeyMappings[i].key, consoleKeyMappings[i].key );
+				}
+#endif
+				break;
+			}
+		}
+	}
+}
+
 /*
 ===============
 Sys_GetConsoleKey
 ===============
 */
-unsigned char Sys_GetConsoleKey(bool shifted) {
-	static unsigned char keys[2] = { '`', '~' };
-
-	if (in_kbd.IsModified()) {
-		idStr lang = in_kbd.GetString();
-
-		if (lang.Length()) {
-			if (!lang.Icmp("french")) {
-				keys[0] = '<';
-				keys[1] = '>';
-			} else if (!lang.Icmp("german")) {
-				keys[0] = '^';
-				keys[1] = 176; // °
-			} else if (!lang.Icmp("italian")) {
-				keys[0] = '\\';
-				keys[1] = '|';
-			} else if (!lang.Icmp("spanish")) {
-				keys[0] = 186; // º
-				keys[1] = 170; // ª
-			} else if (!lang.Icmp("turkish")) {
-				keys[0] = '"';
-				keys[1] = 233; // é
-			} else if (!lang.Icmp("norwegian")) {
-				keys[0] = 124; // |
-				keys[1] = 167; // §
-			} else if (!lang.Icmp("brazilian")) {
-				keys[0] = '\'';
-				keys[1] = '"';
-			}
-		}
+unsigned char Sys_GetConsoleKey( bool shifted ) {
 
+	if ( in_ignoreConsoleKey.GetBool() ) {
+		return 0;
+	}
+
+	if ( in_kbd.IsModified() ) {
+		initConsoleKeyMapping();
 		in_kbd.ClearModified();
 	}
 
-	return shifted ? keys[1] : keys[0];
+	return shifted ? consoleKeyMappings[consoleKeyMappingIdx].keyShifted : consoleKeyMappings[consoleKeyMappingIdx].key;
 }
 
 /*
@@ -368,15 +655,13 @@ unsigned char Sys_MapCharForKey(int key) {
 /*
 ===============
 Sys_GrabMouseCursor
+Note: Usually grabbing is handled in idCommonLocal::Frame() -> Sys_GenerateEvents() -> handleMouseGrab()
+      This function should only be used to release the mouse before long operations where
+      common->Frame() won't be called for a while
 ===============
 */
 void Sys_GrabMouseCursor(bool grabIt) {
-	int flags;
-
-	if (grabIt)
-		flags = GRAB_ENABLE | GRAB_HIDECURSOR | GRAB_SETSTATE;
-	else
-		flags = GRAB_SETSTATE;
+	int flags = grabIt ? (GRAB_GRABMOUSE | GRAB_HIDECURSOR | GRAB_RELATIVEMOUSE) : 0;
 
 	GLimp_GrabInput(flags);
 }
@@ -389,7 +674,7 @@ Sys_GetEvent
 sysEvent_t Sys_GetEvent() {
 	SDL_Event ev;
 	sysEvent_t res = { };
-	byte key;
+	int key;
 
 	static const sysEvent_t res_none = { SE_NONE, 0, 0, 0, NULL };
 
@@ -399,7 +684,7 @@ sysEvent_t Sys_GetEvent() {
 
 	if (s[0] != '\0') {
 		res.evType = SE_CHAR;
-		res.evValue = s[s_pos];
+		res.evValue = (unsigned char)s[s_pos];
 
 		++s_pos;
 
@@ -442,15 +727,14 @@ sysEvent_t Sys_GetEvent() {
 					} // new context because visual studio complains about newmod and currentmod not initialized because of the case SDL_WINDOWEVENT_FOCUS_LOST
 
 					
-					common->ActivateTool( false );
-					GLimp_GrabInput(GRAB_ENABLE | GRAB_REENABLE | GRAB_HIDECURSOR); // FIXME: not sure this is still needed after the ActivateTool()-call
+					in_hasFocus = true;
 
 					// start playing the game sound world again (when coming from editor)
 					session->SetPlayingSoundWorld();
 
 					break;
 				case SDL_WINDOWEVENT_FOCUS_LOST:
-					GLimp_GrabInput(0);
+					in_hasFocus = false;
 					break;
 			}
 
@@ -458,10 +742,8 @@ sysEvent_t Sys_GetEvent() {
 #else
 		case SDL_ACTIVEEVENT:
 			{
-				int flags = 0;
-
 				if (ev.active.gain) {
-					flags = GRAB_ENABLE | GRAB_REENABLE | GRAB_HIDECURSOR;
+					in_hasFocus = true;
 
 					// unset modifier, in case alt-tab was used to leave window and ALT is still set
 					// as that can cause fullscreen-toggling when pressing enter...
@@ -471,9 +753,9 @@ sysEvent_t Sys_GetEvent() {
 						newmod |= KMOD_CAPS;
 
 					SDL_SetModState((SDLMod)newmod);
+				} else {
+					in_hasFocus = false;
 				}
-
-				GLimp_GrabInput(flags);
 			}
 
 			continue; // handle next event
@@ -494,15 +776,17 @@ sysEvent_t Sys_GetEvent() {
 #if !SDL_VERSION_ATLEAST(2, 0, 0)
 			key = mapkey(ev.key.keysym.sym);
 			if (!key) {
-				unsigned char c;
-				// check if its an unmapped console key
-				if (ev.key.keysym.unicode == (c = Sys_GetConsoleKey(false))) {
-					key = c;
-				} else if (ev.key.keysym.unicode == (c = Sys_GetConsoleKey(true))) {
-					key = c;
-				} else {
+				if ( !in_ignoreConsoleKey.GetBool() ) {
+					// check if its an unmapped console key
+					int c = Sys_GetConsoleKey( (ev.key.keysym.mod & KMOD_SHIFT) != 0 );
+					if (ev.key.keysym.unicode == c) {
+						key = c;
+					}
+				}
+				if (!key) {
 					if (ev.type == SDL_KEYDOWN)
-						common->Warning("unmapped SDL key %d (0x%x)", ev.key.keysym.sym, ev.key.keysym.unicode);
+						common->Warning( "unmapped SDL key %d (0x%x) - if possible use SDL2 for better keyboard support",
+						                 ev.key.keysym.sym, ev.key.keysym.unicode );
 					continue; // handle next event
 				}
 			}
@@ -528,12 +812,17 @@ sysEvent_t Sys_GetEvent() {
 				key = mapkey(ev.key.keysym.sym);
 			}
 
-			if(!key) {
-				if (ev.key.keysym.scancode == SDL_SCANCODE_GRAVE) { // TODO: always do this check?
-					key = Sys_GetConsoleKey(true);
-				} else {
+			if ( !in_ignoreConsoleKey.GetBool() && ev.key.keysym.scancode == SDL_SCANCODE_GRAVE ) {
+				// that key between Esc, Tab and 1 is the console key
+				key = K_CONSOLE;
+			}
+
+			if ( !key ) {
+				// if the key couldn't be mapped so far, try to map the scancode to K_SC_*
+				key = getKeynumForSDLscancode(sc);
+				if(!key) {
 					if (ev.type == SDL_KEYDOWN) {
-						common->Warning("unmapped SDL key %d", ev.key.keysym.sym);
+						common->Warning("unmapped SDL key %d (scancode %d)", ev.key.keysym.sym, (int)sc);
 					}
 					continue; // handle next event
 				}
@@ -561,14 +850,24 @@ sysEvent_t Sys_GetEvent() {
 		case SDL_TEXTINPUT:
 			if (ev.text.text[0]) {
 				res.evType = SE_CHAR;
-				res.evValue = ev.text.text[0];
 
-				if (ev.text.text[1] != '\0')
-				{
-					memcpy(s, ev.text.text, SDL_TEXTINPUTEVENT_TEXT_SIZE);
-					s_pos = 1; // pos 0 is returned
+				if ( isAscii(ev.text.text) ) {
+					res.evValue = ev.text.text[0];
+					if ( ev.text.text[1] != '\0' ) {
+						memcpy( s, ev.text.text, SDL_TEXTINPUTEVENT_TEXT_SIZE );
+						s_pos = 1; // pos 0 is returned
+					}
+					return res;
+				} else if( utf8ToISO8859_1( ev.text.text, s, sizeof(s) ) && s[0] != '\0' ) {
+					res.evValue = (unsigned char)s[0];
+					if ( s[1] == '\0' ) {
+						s_pos = 0;
+						s[0] = '\0';
+					} else {
+						s_pos = 1; // pos 0 is returned
+					}
+					return res;
 				}
-				return res;
 			}
 
 			continue; // handle next event
@@ -579,12 +878,18 @@ sysEvent_t Sys_GetEvent() {
 #endif
 
 		case SDL_MOUSEMOTION:
-			res.evType = SE_MOUSE;
-			res.evValue = ev.motion.xrel;
-			res.evValue2 = ev.motion.yrel;
+			if ( in_relativeMouseMode ) {
+				res.evType = SE_MOUSE;
+				res.evValue = ev.motion.xrel;
+				res.evValue2 = ev.motion.yrel;
 
-			mouse_polls.Append(mouse_poll_t(M_DELTAX, ev.motion.xrel));
-			mouse_polls.Append(mouse_poll_t(M_DELTAY, ev.motion.yrel));
+				mouse_polls.Append(mouse_poll_t(M_DELTAX, ev.motion.xrel));
+				mouse_polls.Append(mouse_poll_t(M_DELTAY, ev.motion.yrel));
+			} else {
+				res.evType = SE_MOUSE_ABS;
+				res.evValue = ev.motion.x;
+				res.evValue2 = ev.motion.y;
+			}
 
 			return res;
 
@@ -595,7 +900,7 @@ sysEvent_t Sys_GetEvent() {
 			if (ev.wheel.y > 0) {
 				res.evValue = K_MWHEELUP;
 				mouse_polls.Append(mouse_poll_t(M_DELTAZ, 1));
-			} else {
+			} else if (ev.wheel.y < 0) {
 				res.evValue = K_MWHEELDOWN;
 				mouse_polls.Append(mouse_poll_t(M_DELTAZ, -1));
 			}
@@ -693,17 +998,88 @@ void Sys_ClearEvents() {
 	mouse_polls.SetNum(0, false);
 }
 
+static void handleMouseGrab() {
+
+	// these are the defaults for when the window does *not* have focus
+	// (don't grab in any way)
+	bool showCursor = true;
+	bool grabMouse = false;
+	bool relativeMouse = false;
+
+	// if com_editorActive, release everything, just like when we have no focus
+	if ( in_hasFocus && !com_editorActive ) {
+		// Note: this generally handles fullscreen menus, but not the PDA, because the PDA
+		//       is an ugly hack in gamecode that doesn't go through sessLocal.guiActive.
+		//       It goes through weapon input code or sth? That's also the reason only
+		//       leftclick (fire) works there (no mousewheel..)
+		//       So the PDA will continue to use relative mouse events to set its cursor position.
+		const bool menuActive = ( sessLocal.GetActiveMenu() != NULL );
+
+		if ( menuActive ) {
+			showCursor = false;
+			relativeMouse = false;
+			grabMouse = false; // TODO: or still grab to window? (maybe only if in exclusive fullscreen mode?)
+		} else if ( console->Active() ) {
+			showCursor = true;
+			relativeMouse = grabMouse = false;
+		} else { // in game
+			showCursor = false;
+			grabMouse = relativeMouse = true;
+		}
+
+		in_relativeMouseMode = relativeMouse;
+
+		// if in_nograb is set, in_relativeMouseMode and relativeMouse can disagree
+		// (=> don't enable relative mouse mode in SDL, but still use relative mouse events
+		//  in the game, unless we'd use absolute mousemode anyway)
+		if ( in_nograb.GetBool() ) {
+			grabMouse = relativeMouse = false;
+		}
+	} else {
+		in_relativeMouseMode = false;
+	}
+
+	int flags = 0;
+	if ( !showCursor )
+		flags |= GRAB_HIDECURSOR;
+	if ( grabMouse )
+		flags |= GRAB_GRABMOUSE;
+	if ( relativeMouse )
+		flags |= GRAB_RELATIVEMOUSE;
+
+	GLimp_GrabInput( flags );
+}
+
 /*
 ================
 Sys_GenerateEvents
 ================
 */
 void Sys_GenerateEvents() {
+
+	handleMouseGrab();
+
 	char *s = Sys_ConsoleInput();
 
 	if (s)
 		PushConsoleEvent(s);
 
+#ifndef ID_DEDICATED // doesn't make sense on dedicated server
+	if ( in_grabKeyboard.IsModified() ) {
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+		SDL_SetHint( SDL_HINT_GRAB_KEYBOARD, in_grabKeyboard.GetString() );
+		if ( in_grabKeyboard.GetBool() ) {
+			common->Printf( "in_grabKeyboard: Will grab the keyboard if mouse is grabbed, so global keyboard-shortcuts (like Alt-Tab or the Windows key) will *not* work\n" );
+		} else {
+			common->Printf( "in_grabKeyboard: Will *not* grab the keyboard if mouse is grabbed, so global keyboard-shortcuts (like Alt-Tab) will still work\n" );
+		}
+#else
+		common->Printf( "Note: SDL1.2 doesn't support in_grabKeyboard (it's always grabbed if mouse is grabbed)\n" );
+#endif
+		in_grabKeyboard.ClearModified();
+	}
+#endif
+
 	SDL_PumpEvents();
 }
 
diff --git a/neo/sys/glimp.cpp b/neo/sys/glimp.cpp
index f977cf2..01667bd 100644
--- a/neo/sys/glimp.cpp
+++ b/neo/sys/glimp.cpp
@@ -36,12 +36,67 @@ If you have questions concerning this license or the applicable additional terms
 #if defined(_WIN32) && defined(ID_ALLOW_TOOLS)
 #include "sys/win32/win_local.h"
 #include <SDL_syswm.h>
+
+// from SDL_windowsopengl.h (internal SDL2 header)
+#ifndef WGL_ARB_pixel_format
+#define WGL_NUMBER_PIXEL_FORMATS_ARB   0x2000
+#define WGL_DRAW_TO_WINDOW_ARB         0x2001
+#define WGL_DRAW_TO_BITMAP_ARB         0x2002
+#define WGL_ACCELERATION_ARB           0x2003
+#define WGL_NEED_PALETTE_ARB           0x2004
+#define WGL_NEED_SYSTEM_PALETTE_ARB    0x2005
+#define WGL_SWAP_LAYER_BUFFERS_ARB     0x2006
+#define WGL_SWAP_METHOD_ARB            0x2007
+#define WGL_NUMBER_OVERLAYS_ARB        0x2008
+#define WGL_NUMBER_UNDERLAYS_ARB       0x2009
+#define WGL_TRANSPARENT_ARB            0x200A
+#define WGL_TRANSPARENT_RED_VALUE_ARB  0x2037
+#define WGL_TRANSPARENT_GREEN_VALUE_ARB 0x2038
+#define WGL_TRANSPARENT_BLUE_VALUE_ARB 0x2039
+#define WGL_TRANSPARENT_ALPHA_VALUE_ARB 0x203A
+#define WGL_TRANSPARENT_INDEX_VALUE_ARB 0x203B
+#define WGL_SHARE_DEPTH_ARB            0x200C
+#define WGL_SHARE_STENCIL_ARB          0x200D
+#define WGL_SHARE_ACCUM_ARB            0x200E
+#define WGL_SUPPORT_GDI_ARB            0x200F
+#define WGL_SUPPORT_OPENGL_ARB         0x2010
+#define WGL_DOUBLE_BUFFER_ARB          0x2011
+#define WGL_STEREO_ARB                 0x2012
+#define WGL_PIXEL_TYPE_ARB             0x2013
+#define WGL_COLOR_BITS_ARB             0x2014
+#define WGL_RED_BITS_ARB               0x2015
+#define WGL_RED_SHIFT_ARB              0x2016
+#define WGL_GREEN_BITS_ARB             0x2017
+#define WGL_GREEN_SHIFT_ARB            0x2018
+#define WGL_BLUE_BITS_ARB              0x2019
+#define WGL_BLUE_SHIFT_ARB             0x201A
+#define WGL_ALPHA_BITS_ARB             0x201B
+#define WGL_ALPHA_SHIFT_ARB            0x201C
+#define WGL_ACCUM_BITS_ARB             0x201D
+#define WGL_ACCUM_RED_BITS_ARB         0x201E
+#define WGL_ACCUM_GREEN_BITS_ARB       0x201F
+#define WGL_ACCUM_BLUE_BITS_ARB        0x2020
+#define WGL_ACCUM_ALPHA_BITS_ARB       0x2021
+#define WGL_DEPTH_BITS_ARB             0x2022
+#define WGL_STENCIL_BITS_ARB           0x2023
+#define WGL_AUX_BUFFERS_ARB            0x2024
+#define WGL_NO_ACCELERATION_ARB        0x2025
+#define WGL_GENERIC_ACCELERATION_ARB   0x2026
+#define WGL_FULL_ACCELERATION_ARB      0x2027
+#define WGL_SWAP_EXCHANGE_ARB          0x2028
+#define WGL_SWAP_COPY_ARB              0x2029
+#define WGL_SWAP_UNDEFINED_ARB         0x202A
+#define WGL_TYPE_RGBA_ARB              0x202B
+#define WGL_TYPE_COLORINDEX_ARB        0x202C
+#endif
+
+#ifndef WGL_ARB_multisample
+#define WGL_SAMPLE_BUFFERS_ARB         0x2041
+#define WGL_SAMPLES_ARB                0x2042
 #endif
 
-idCVar in_nograb("in_nograb", "0", CVAR_SYSTEM | CVAR_NOCHEAT, "prevents input grabbing");
-idCVar r_waylandcompat("r_waylandcompat", "0", CVAR_SYSTEM | CVAR_NOCHEAT | CVAR_ARCHIVE, "wayland compatible framebuffer");
+#endif // _WIN32 and ID_ALLOW_TOOLS
 
-static bool grabbed = false;
 
 #if SDL_VERSION_ATLEAST(2, 0, 0)
 static SDL_Window *window = NULL;
@@ -107,11 +162,38 @@ bool GLimp_Init(glimpParms_t parms) {
 			flags |= SDL_WINDOW_FULLSCREEN;
 	}
 
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	/* Doom3 has the nasty habit of modifying the default framebuffer's alpha channel and then
+	 * relying on those modifications in blending operations (using GL_DST_(ONE_MINUS_)ALPHA).
+	 * So far that hasn't been much of a problem, because Windows, macOS, X11 etc
+	 * just ignore the alpha chan (unless maybe you explicitly tell a window it should be transparent).
+	 * Unfortunately, Wayland by default *does* use the alpha channel, which often leads to
+	 * rendering bugs (the window is partly transparent or very white in areas with low alpha).
+	 * Mesa introduced an EGL extension that's supposed to fix that (EGL_EXT_present_opaque)
+	 * and newer SDL2 versions use it by default (in the Wayland backend).
+	 * Unfortunately, the implementation of that extension is (currently?) broken (at least
+	 * in Mesa), seems like they just give you a visual without any alpha chan - which doesn't
+	 * work for Doom3, as it needs a functioning alpha chan for blending operations, see above.
+	 * See also: https://gitlab.freedesktop.org/mesa/mesa/-/issues/5886
+	 *
+	 * So to make sure dhewm3 (finally) works as expected on Wayland, we tell SDL2 to
+	 * allow transparency and then fill the alpha-chan ourselves in RB_SwapBuffers()
+	 * (unless the user disables that with r_fillWindowAlphaChan 0) */
+  #ifdef SDL_HINT_VIDEO_EGL_ALLOW_TRANSPARENCY
+	SDL_SetHint(SDL_HINT_VIDEO_EGL_ALLOW_TRANSPARENCY, "1");
+  #else // little hack so this works if the SDL2 version used for building is older than runtime version
+	SDL_SetHint("SDL_VIDEO_EGL_ALLOW_TRANSPARENCY", "1");
+  #endif
+#endif
+
 	int colorbits = 24;
 	int depthbits = 24;
 	int stencilbits = 8;
 
 	for (int i = 0; i < 16; i++) {
+
+		int multisamples = parms.multiSamples;
+
 		// 0 - default
 		// 1 - minus colorbits
 		// 2 - minus depthbits
@@ -168,6 +250,10 @@ bool GLimp_Init(glimpParms_t parms) {
 		if (tcolorbits == 24)
 			channelcolorbits = 8;
 
+		int talphabits = channelcolorbits;
+
+try_again:
+
 		SDL_GL_SetAttribute(SDL_GL_RED_SIZE, channelcolorbits);
 		SDL_GL_SetAttribute(SDL_GL_GREEN_SIZE, channelcolorbits);
 		SDL_GL_SetAttribute(SDL_GL_BLUE_SIZE, channelcolorbits);
@@ -175,37 +261,50 @@ bool GLimp_Init(glimpParms_t parms) {
 		SDL_GL_SetAttribute(SDL_GL_DEPTH_SIZE, tdepthbits);
 		SDL_GL_SetAttribute(SDL_GL_STENCIL_SIZE, tstencilbits);
 
-		if (r_waylandcompat.GetBool())
-			SDL_GL_SetAttribute(SDL_GL_ALPHA_SIZE, 0);
-		else
-			SDL_GL_SetAttribute(SDL_GL_ALPHA_SIZE, channelcolorbits);
+		SDL_GL_SetAttribute(SDL_GL_ALPHA_SIZE, talphabits);
 
 		SDL_GL_SetAttribute(SDL_GL_STEREO, parms.stereo ? 1 : 0);
 
-		SDL_GL_SetAttribute(SDL_GL_MULTISAMPLEBUFFERS, parms.multiSamples ? 1 : 0);
-		SDL_GL_SetAttribute(SDL_GL_MULTISAMPLESAMPLES, parms.multiSamples);
+		SDL_GL_SetAttribute(SDL_GL_MULTISAMPLEBUFFERS, (multisamples > 1) ? 1 : 0);
+		SDL_GL_SetAttribute(SDL_GL_MULTISAMPLESAMPLES, multisamples);
 
 #if SDL_VERSION_ATLEAST(2, 0, 0)
-		int displayIndex = 0;
 
+		const char* windowMode = "";
+		if(r_fullscreen.GetBool()) {
+			windowMode = r_fullscreenDesktop.GetBool() ? "desktop-fullscreen-" : "fullscreen-";
+		}
+
+		common->Printf("Will create a %swindow with resolution %dx%d (r_mode = %d)\n",
+		               windowMode, parms.width, parms.height, r_mode.GetInteger());
+
+		int displayIndex = 0;
+#if SDL_VERSION_ATLEAST(2, 0, 4)
 		// try to put the window on the display the mousecursor currently is on
 		{
 			int x, y;
 			SDL_GetGlobalMouseState(&x, &y);
 
 			int numDisplays = SDL_GetNumVideoDisplays();
-			for (i=0; i<numDisplays; ++i) {
+			common->Printf("SDL detected %d displays: \n", numDisplays);
+			bool found = false;
+			for ( int j=0; j<numDisplays; ++j ) {
 				SDL_Rect rect;
-				if (SDL_GetDisplayBounds(i, &rect) == 0) {
-					if (   x >= rect.x && x < rect.x + rect.w
+				if (SDL_GetDisplayBounds(j, &rect) == 0) {
+					common->Printf(" %d: %dx%d at (%d, %d) to (%d, %d)\n", j, rect.w, rect.h,
+					               rect.x, rect.y, rect.x+rect.w, rect.y+rect.h);
+					if ( !found && x >= rect.x && x < rect.x + rect.w
 						&& y >= rect.y && y < rect.y + rect.h )
 					{
-						displayIndex = i;
-						break;
+						displayIndex = j;
+						found = true;
 					}
 				}
 			}
+			common->Printf("Will use display %d because mouse cursor is at (%d, %d).\n",
+			               displayIndex, x, y);
 		}
+#endif
 
 		window = SDL_CreateWindow(ENGINE_VERSION,
 									SDL_WINDOWPOS_UNDEFINED_DISPLAY(displayIndex),
@@ -213,9 +312,23 @@ bool GLimp_Init(glimpParms_t parms) {
 									parms.width, parms.height, flags);
 
 		if (!window) {
-			common->DPrintf("Couldn't set GL mode %d/%d/%d: %s",
-							channelcolorbits, tdepthbits, tstencilbits, SDL_GetError());
+			common->Warning("Couldn't set GL mode %d/%d/%d with %dx MSAA: %s",
+							channelcolorbits, tdepthbits, tstencilbits, parms.multiSamples, SDL_GetError());
+
+			// before trying to reduce color channel size or whatever, first try reducing MSAA, if possible
+			if(multisamples > 1) {
+				multisamples = (multisamples <= 2) ? 0 : (multisamples/2);
+
+				// using goto because enhancing that logic which reduces attributes
+				// based on i (so it'd first try reducing MSAA) would be too painful
+				goto try_again;
+			}
+
 			continue;
+		} else {
+			// creating the window succeeded, so adjust r_multiSamples to the value that was actually used
+			parms.multiSamples = multisamples;
+			r_multiSamples.SetInteger(multisamples);
 		}
 
 		/* Check if we're really in the requested display mode. There is
@@ -236,6 +349,10 @@ bool GLimp_Init(glimpParms_t parms) {
 			{
 				common->Warning("Current display mode isn't requested display mode\n");
 				common->Warning("Likely SDL bug #4700, trying to work around it..\n");
+				int dIdx = SDL_GetWindowDisplayIndex(window);
+				if(dIdx != displayIndex) {
+					common->Warning("Window's display index is %d, but we wanted %d!\n", dIdx, displayIndex);
+				}
 
 				/* Mkay, try to hack around that. */
 				SDL_DisplayMode wanted_mode = {};
@@ -305,7 +422,21 @@ bool GLimp_Init(glimpParms_t parms) {
 		if (!window) {
 			common->DPrintf("Couldn't set GL mode %d/%d/%d: %s",
 							channelcolorbits, tdepthbits, tstencilbits, SDL_GetError());
+
+			// before trying to reduce color channel size or whatever, first try reducing MSAA, if possible
+			if(multisamples > 1) {
+				multisamples = (multisamples <= 2) ? 0 : (multisamples/2);
+
+				// using goto because enhancing that logic which reduces attributes
+				// based on i (so it'd first try reducing MSAA) would be too painful
+				goto try_again;
+			}
+
 			continue;
+		} else {
+			// creating the window succeeded, so adjust r_multiSamples to the value that was actually used
+			parms.multiSamples = multisamples;
+			r_multiSamples.SetInteger(multisamples);
 		}
 
 		glConfig.vidWidth = window->w;
@@ -316,7 +447,7 @@ bool GLimp_Init(glimpParms_t parms) {
 
 #if defined(_WIN32) && defined(ID_ALLOW_TOOLS)
 
-#ifndef SDL_VERSION_ATLEAST(2, 0, 0)
+#if ! SDL_VERSION_ATLEAST(2, 0, 0)
 	#error "dhewm3 only supports the tools with SDL2, not SDL1!"
 #endif
 
@@ -335,43 +466,107 @@ bool GLimp_Init(glimpParms_t parms) {
 			// NOTE: hInstance is set in main()
 			win32.hGLRC = qwglGetCurrentContext();
 
-			PIXELFORMATDESCRIPTOR src =
+			int pfIdx = GetPixelFormat(win32.hDC);
+			PIXELFORMATDESCRIPTOR src = {};
+			if (DescribePixelFormat(win32.hDC, pfIdx, sizeof(PIXELFORMATDESCRIPTOR), &win32.pfd) == 0)
 			{
-				sizeof(PIXELFORMATDESCRIPTOR),	// size of this pfd
-				1,								// version number
-				PFD_DRAW_TO_WINDOW |			// support window
-				PFD_SUPPORT_OPENGL |			// support OpenGL
-				PFD_DOUBLEBUFFER,				// double buffered
-				PFD_TYPE_RGBA,					// RGBA type
-				32,								// 32-bit color depth
-				0, 0, 0, 0, 0, 0,				// color bits ignored
-				8,								// 8 bit destination alpha
-				0,								// shift bit ignored
-				0,								// no accumulation buffer
-				0, 0, 0, 0, 					// accum bits ignored
-				24,								// 24-bit z-buffer	
-				8,								// 8-bit stencil buffer
-				0,								// no auxiliary buffer
-				PFD_MAIN_PLANE,					// main layer
-				0,								// reserved
-				0, 0, 0							// layer masks ignored
-			};
-			memcpy(&win32.pfd, &src, sizeof(PIXELFORMATDESCRIPTOR));
+				common->Warning("DescribePixelFormat() failed: %d!\n", GetLastError());
+				PIXELFORMATDESCRIPTOR src =
+				{
+					sizeof(PIXELFORMATDESCRIPTOR),	// size of this pfd
+					1,								// version number
+					PFD_DRAW_TO_WINDOW |			// support window
+					PFD_SUPPORT_OPENGL |			// support OpenGL
+					PFD_DOUBLEBUFFER,				// double buffered
+					PFD_TYPE_RGBA,					// RGBA type
+					32,								// 32-bit color depth
+					0, 0, 0, 0, 0, 0,				// color bits ignored
+					8,								// 8 bit destination alpha
+					0,								// shift bit ignored
+					0,								// no accumulation buffer
+					0, 0, 0, 0, 					// accum bits ignored
+					24,								// 24-bit z-buffer
+					8,								// 8-bit stencil buffer
+					0,								// no auxiliary buffer
+					PFD_MAIN_PLANE,					// main layer
+					0,								// reserved
+					0, 0, 0							// layer masks ignored
+				};
+				memcpy(&win32.pfd, &src, sizeof(PIXELFORMATDESCRIPTOR));
+			}
+			
+			win32.piAttribIList = NULL;
+
+			win32.wglGetPixelFormatAttribivARB = (BOOL(WINAPI*)(HDC,int,int,UINT,const int*,int*))SDL_GL_GetProcAddress("wglGetPixelFormatAttribivARB");
+			win32.wglChoosePixelFormatARB = (BOOL(WINAPI*)(HDC,const int*,const FLOAT*,UINT,int*piFormats,UINT*))SDL_GL_GetProcAddress("wglChoosePixelFormatARB");
+
+			if(win32.wglGetPixelFormatAttribivARB != NULL && win32.wglChoosePixelFormatARB != NULL) {
+				const int queryAttributes[] = {
+					// equivalents of all the SDL_GL_* attributes we set above (and ones set implicitly)
+					WGL_DRAW_TO_WINDOW_ARB,
+					WGL_RED_BITS_ARB,
+					WGL_GREEN_BITS_ARB,
+					WGL_BLUE_BITS_ARB,
+					WGL_ALPHA_BITS_ARB,
+					WGL_DOUBLE_BUFFER_ARB,
+					WGL_DEPTH_BITS_ARB,
+					WGL_STENCIL_BITS_ARB,
+					// WGL_ACCUM_*_BITS_ARB - not used
+					WGL_STEREO_ARB,
+					WGL_SAMPLE_BUFFERS_ARB,
+					WGL_SAMPLES_ARB,
+					// WGL_FRAMEBUFFER_SRGB_CAPABLE_ARB - not used
+					WGL_ACCELERATION_ARB,
+				};
+				enum { NUM_ATTRIBUTES = sizeof(queryAttributes)/sizeof(queryAttributes[0]) };
+				int queryResults[NUM_ATTRIBUTES] = {};
+				
+				win32.wglGetPixelFormatAttribivARB(win32.hDC, pfIdx, PFD_MAIN_PLANE, NUM_ATTRIBUTES, queryAttributes, queryResults);
+				
+				static int attribIList[2*NUM_ATTRIBUTES+2] = {}; // +2 for terminating 0, 0 pair
+				for(int i=0; i<NUM_ATTRIBUTES; ++i) {
+					attribIList[i*2] = queryAttributes[i];
+					attribIList[i*2+1] = queryResults[i];
+				}
+				win32.piAttribIList = attribIList;
+			}
 		} else {
 			// TODO: can we just disable them?
 			common->Error("SDL_GetWindowWMInfo(), which is needed for Tools to work, failed!");
 		}		
 #endif // defined(_WIN32) && defined(ID_ALLOW_TOOLS)
 
-		common->Printf("Using %d color bits, %d depth, %d stencil display\n",
-						channelcolorbits, tdepthbits, tstencilbits);
+		common->Printf("Requested %d color bits per chan, %d alpha %d depth, %d stencil\n",
+						channelcolorbits, talphabits, tdepthbits, tstencilbits);
 
-		glConfig.colorBits = tcolorbits;
-		glConfig.depthBits = tdepthbits;
-		glConfig.stencilBits = tstencilbits;
+		{
+			int r, g, b, a, d, s;
+			SDL_GL_GetAttribute(SDL_GL_RED_SIZE, &r);
+			SDL_GL_GetAttribute(SDL_GL_GREEN_SIZE, &g);
+			SDL_GL_GetAttribute(SDL_GL_BLUE_SIZE, &b);
+			SDL_GL_GetAttribute(SDL_GL_ALPHA_SIZE, &a);
+			SDL_GL_GetAttribute(SDL_GL_DEPTH_SIZE, &d);
+			SDL_GL_GetAttribute(SDL_GL_STENCIL_SIZE, &s);
+
+			common->Printf("Got %d stencil bits, %d depth bits, color bits: r%d g%d b%d a%d\n", s, d, r, g, b, a);
+
+			glConfig.colorBits = r+g+b; // a bit imprecise, but seems to be used only in GfxInfo_f()
+			glConfig.alphabits = a;
+			glConfig.depthBits = d;
+			glConfig.stencilBits = s;
+		}
 
 		glConfig.displayFrequency = 0;
 
+		// for r_fillWindowAlphaChan -1, see also the big comment above
+		glConfig.shouldFillWindowAlpha = false;
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+		const char* videoDriver = SDL_GetCurrentVideoDriver();
+		if (idStr::Icmp(videoDriver, "wayland") == 0) {
+			glConfig.shouldFillWindowAlpha = true;
+		}
+#endif
+
 		break;
 	}
 
@@ -427,6 +622,12 @@ void GLimp_SwapBuffers() {
 #endif
 }
 
+static bool gammaOrigError = false;
+static bool gammaOrigSet = false;
+static unsigned short gammaOrigRed[256];
+static unsigned short gammaOrigGreen[256];
+static unsigned short gammaOrigBlue[256];
+
 /*
 =================
 GLimp_SetGamma
@@ -438,6 +639,19 @@ void GLimp_SetGamma(unsigned short red[256], unsigned short green[256], unsigned
 		return;
 	}
 
+	if ( !gammaOrigSet ) {
+		gammaOrigSet = true;
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+		if ( SDL_GetWindowGammaRamp( window, gammaOrigRed, gammaOrigGreen, gammaOrigBlue ) == -1 ) {
+#else
+		if ( SDL_GetGammaRamp( gammaOrigRed, gammaOrigGreen, gammaOrigBlue ) == -1 ) {
+#endif
+			gammaOrigError = true;
+			common->Warning( "Failed to get Gamma Ramp: %s\n", SDL_GetError() );
+		}
+	}
+
+
 #if SDL_VERSION_ATLEAST(2, 0, 0)
 	if (SDL_SetWindowGammaRamp(window, red, green, blue))
 #else
@@ -446,6 +660,30 @@ void GLimp_SetGamma(unsigned short red[256], unsigned short green[256], unsigned
 		common->Warning("Couldn't set gamma ramp: %s", SDL_GetError());
 }
 
+/*
+=================
+GLimp_ResetGamma
+
+Restore original system gamma setting
+=================
+*/
+void GLimp_ResetGamma() {
+	if( gammaOrigError ) {
+		common->Warning( "Can't reset hardware gamma because getting the Gamma Ramp at startup failed!\n" );
+		common->Warning( "You might have to restart the game for gamma/brightness in shaders to work properly.\n" );
+		return;
+	}
+
+	if( gammaOrigSet ) {
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+		SDL_SetWindowGammaRamp( window, gammaOrigRed, gammaOrigGreen, gammaOrigBlue );
+#else
+		SDL_SetGammaRamp( gammaOrigRed, gammaOrigGreen, gammaOrigBlue );
+#endif
+	}
+}
+
+
 /*
 =================
 GLimp_ActivateContext
@@ -476,28 +714,19 @@ GLExtension_t GLimp_ExtensionPointer(const char *name) {
 }
 
 void GLimp_GrabInput(int flags) {
-	bool grab = flags & GRAB_ENABLE;
-
-	if (grab && (flags & GRAB_REENABLE))
-		grab = false;
-
-	if (flags & GRAB_SETSTATE)
-		grabbed = grab;
-
-	if (in_nograb.GetBool())
-		grab = false;
-
 	if (!window) {
 		common->Warning("GLimp_GrabInput called without window");
 		return;
 	}
 
 #if SDL_VERSION_ATLEAST(2, 0, 0)
-	SDL_ShowCursor(flags & GRAB_HIDECURSOR ? SDL_DISABLE : SDL_ENABLE);
-	SDL_SetRelativeMouseMode((grab && (flags & GRAB_HIDECURSOR)) ? SDL_TRUE : SDL_FALSE);
-	SDL_SetWindowGrab(window, grab ? SDL_TRUE : SDL_FALSE);
+	SDL_ShowCursor( (flags & GRAB_HIDECURSOR) ? SDL_DISABLE : SDL_ENABLE );
+	SDL_SetRelativeMouseMode( (flags & GRAB_RELATIVEMOUSE) ? SDL_TRUE : SDL_FALSE );
+	SDL_SetWindowGrab( window, (flags & GRAB_GRABMOUSE) ? SDL_TRUE : SDL_FALSE );
 #else
-	SDL_ShowCursor(flags & GRAB_HIDECURSOR ? SDL_DISABLE : SDL_ENABLE);
-	SDL_WM_GrabInput(grab ? SDL_GRAB_ON : SDL_GRAB_OFF);
+	SDL_ShowCursor( (flags & GRAB_HIDECURSOR) ? SDL_DISABLE : SDL_ENABLE );
+	// ignore GRAB_GRABMOUSE, SDL1.2 doesn't support grabbing without relative mode
+	// so only grab if we want relative mode
+	SDL_WM_GrabInput( (flags & GRAB_RELATIVEMOUSE) ? SDL_GRAB_ON : SDL_GRAB_OFF );
 #endif
 }
diff --git a/neo/sys/linux/main.cpp b/neo/sys/linux/main.cpp
index 5e08e2d..28a8a9b 100644
--- a/neo/sys/linux/main.cpp
+++ b/neo/sys/linux/main.cpp
@@ -42,14 +42,152 @@ If you have questions concerning this license or the applicable additional terms
 
 #include <locale.h>
 
-static char path_argv[MAX_OSPATH];
+
+
+#undef snprintf // no, I don't want to use idStr::snPrintf() here.
+
+// lots of code following to get the current executable dir, taken from Yamagi Quake II
+// and actually based on DG_Snippets.h
+
+#if defined(__linux) || defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__)
+#include <unistd.h> // readlink(), amongst others
+#endif
+
+#if defined(__FreeBSD__) || defined(__NetBSD__) || defined(__DragonFly__)
+#include <sys/sysctl.h> // for sysctl() to get path to executable
+#endif
+
+#ifdef _WIN32
+#include <windows.h> // GetModuleFileNameA()
+#endif
+
+#ifdef __APPLE__
+#include <mach-o/dyld.h> // _NSGetExecutablePath
+#endif
+
+#ifdef __HAIKU__
+#include <FindDirectory.h>
+#endif
+
+#ifndef PATH_MAX
+// this is mostly for windows. windows has a MAX_PATH = 260 #define, but allows
+// longer paths anyway.. this might not be the maximum allowed length, but is
+// hopefully good enough for realistic usecases
+#define PATH_MAX 4096
+#endif
+
+static char path_argv[PATH_MAX];
+static char path_exe[PATH_MAX];
+static char save_path[PATH_MAX];
+
+const char* Posix_GetSavePath()
+{
+	return save_path;
+}
+
+static void SetSavePath()
+{
+	const char* s = getenv("XDG_DATA_HOME");
+	if (s)
+		D3_snprintfC99(save_path, sizeof(save_path), "%s/dhewm3", s);
+	else
+		D3_snprintfC99(save_path, sizeof(save_path), "%s/.local/share/dhewm3", getenv("HOME"));
+}
+
+const char* Posix_GetExePath()
+{
+	return path_exe;
+}
+
+static void SetExecutablePath(char* exePath)
+{
+	// !!! this assumes that exePath can hold PATH_MAX chars !!!
+
+#ifdef _WIN32
+	WCHAR wexePath[PATH_MAX];
+	DWORD len;
+
+	GetModuleFileNameW(NULL, wexePath, PATH_MAX);
+	len = WideCharToMultiByte(CP_UTF8, 0, wexePath, -1, exePath, PATH_MAX, NULL, NULL);
+
+	if(len <= 0 || len == PATH_MAX)
+	{
+		// an error occured, clear exe path
+		exePath[0] = '\0';
+	}
+
+#elif defined(__linux)
+
+	// all the platforms that have /proc/$pid/exe or similar that symlink the
+	// real executable - basiscally Linux and the BSDs except for FreeBSD which
+	// doesn't enable proc by default and has a sysctl() for this. OpenBSD once
+	// had /proc but removed it for security reasons.
+	char buf[PATH_MAX] = {0};
+	snprintf(buf, sizeof(buf), "/proc/%d/exe", getpid());
+	// readlink() doesn't null-terminate!
+	int len = readlink(buf, exePath, PATH_MAX-1);
+	if (len <= 0)
+	{
+		// an error occured, clear exe path
+		exePath[0] = '\0';
+	}
+	else
+	{
+		exePath[len] = '\0';
+	}
+
+#elif defined(__FreeBSD__) || defined(__NetBSD__) || defined(__DragonFly__)
+
+	// the sysctl should also work when /proc/ is not mounted (which seems to
+	// be common on FreeBSD), so use it..
+#if defined(__FreeBSD__) || defined(__DragonFly__)
+	int name[4] = {CTL_KERN, KERN_PROC, KERN_PROC_PATHNAME, -1};
+#else
+	int name[4] = {CTL_KERN, KERN_PROC_ARGS, -1, KERN_PROC_PATHNAME};
+#endif
+	size_t len = PATH_MAX-1;
+	int ret = sysctl(name, sizeof(name)/sizeof(name[0]), exePath, &len, NULL, 0);
+	if(ret != 0)
+	{
+		// an error occured, clear exe path
+		exePath[0] = '\0';
+	}
+
+#elif defined(__APPLE__)
+
+	uint32_t bufSize = PATH_MAX;
+	if(_NSGetExecutablePath(exePath, &bufSize) != 0)
+	{
+		// WTF, PATH_MAX is not enough to hold the path?
+		// an error occured, clear exe path
+		exePath[0] = '\0';
+	}
+
+	// TODO: realpath() ?
+	// TODO: no idea what this is if the executable is in an app bundle
+#elif defined(__HAIKU__)
+	if (find_path(B_APP_IMAGE_SYMBOL, B_FIND_PATH_IMAGE_PATH, NULL, exePath, PATH_MAX) != B_OK)
+	{
+		exePath[0] = '\0';
+	}
+
+#else
+
+	// Several platforms (for example OpenBSD) don't provide a
+	// reliable way to determine the executable path. Just return
+	// an empty string.
+	exePath[0] = '\0';
+
+// feel free to add implementation for your platform and send a pull request.
+#warning "SetExecutablePath() is unimplemented on this platform"
+
+#endif
+}
 
 bool Sys_GetPath(sysPath_t type, idStr &path) {
 	const char *s;
 	char buf[MAX_OSPATH];
-	char buf2[MAX_OSPATH];
 	struct stat st;
-	size_t len;
 
 	path.Clear();
 
@@ -102,30 +240,15 @@ bool Sys_GetPath(sysPath_t type, idStr &path) {
 		return true;
 
 	case PATH_SAVE:
-		s = getenv("XDG_DATA_HOME");
-		if (s)
-			idStr::snPrintf(buf, sizeof(buf), "%s/dhewm3", s);
-		else
-			idStr::snPrintf(buf, sizeof(buf), "%s/.local/share/dhewm3", getenv("HOME"));
-
-		path = buf;
-		return true;
-
-	case PATH_EXE:
-		idStr::snPrintf(buf, sizeof(buf), "/proc/%d/exe", getpid());
-		len = readlink(buf, buf2, sizeof(buf2));
-		if (len != -1) {
-			if (len < MAX_OSPATH) {
-				buf2[len] = '\0';
-			} else {
-				buf2[MAX_OSPATH - 1] = '\0';
-			}
-			path = buf2;
+		if(save_path[0] != '\0') {
+			path = save_path;
 			return true;
 		}
+		return false;
 
-		if (path_argv[0] != 0) {
-			path = path_argv;
+	case PATH_EXE:
+		if (path_exe[0] != '\0') {
+			path = path_exe;
 			return true;
 		}
 
@@ -285,6 +408,15 @@ main
 ===============
 */
 int main(int argc, char **argv) {
+	// Prevent running Doom 3 as root
+	// Borrowed from Yamagi Quake II
+	if (getuid() == 0) {
+		printf("Doom 3 shouldn't be run as root! Backing out to save your ass. If\n");
+		printf("you really know what you're doing, edit neo/sys/linux/main.cpp and remove\n");
+		printf("this check. But don't complain if an imp kills your bunny afterwards!:)\n");
+
+		return 1;
+	}
 	// fallback path to the binary for systems without /proc
 	// while not 100% reliable, its good enough
 	if (argc > 0) {
@@ -294,6 +426,13 @@ int main(int argc, char **argv) {
 		path_argv[0] = 0;
 	}
 
+	SetExecutablePath(path_exe);
+	if (path_exe[0] == '\0') {
+		memcpy(path_exe, path_argv, sizeof(path_exe));
+	}
+
+	SetSavePath();
+
 	// some ladspa-plugins (that may be indirectly loaded by doom3 if they're
 	// used by alsa) call setlocale(LC_ALL, ""); This sets LC_ALL to $LANG or
 	// $LC_ALL which usually is not "C" and will fuck up scanf, strtod
diff --git a/neo/sys/osx/DOOMController.mm b/neo/sys/osx/DOOMController.mm
index 70e3162..5d36338 100644
--- a/neo/sys/osx/DOOMController.mm
+++ b/neo/sys/osx/DOOMController.mm
@@ -36,34 +36,41 @@ If you have questions concerning this license or the applicable additional terms
 #include <SDL_main.h>
 
 #include "sys/platform.h"
+
+#include <sys/types.h>
+#include <sys/sysctl.h>
+
 #include "idlib/Str.h"
 #include "framework/Common.h"
 
 #include "sys/posix/posix_public.h"
 
-bool Sys_GetPath(sysPath_t type, idStr &path) {
-	char buf[MAXPATHLEN];
-	char *snap;
+static char base_path[MAXPATHLEN];
+static char exe_path[MAXPATHLEN];
+static char save_path[MAXPATHLEN];
 
+
+const char* Posix_GetExePath() {
+	return exe_path;
+}
+
+const char* Posix_GetSavePath() {
+	return save_path;
+}
+
+bool Sys_GetPath(sysPath_t type, idStr &path) {
 	switch(type) {
 	case PATH_BASE:
-		strncpy(buf, [ [ [ NSBundle mainBundle ] bundlePath ] cString ], MAXPATHLEN );
-		snap = strrchr(buf, '/');
-		if (snap)
-			*snap = '\0';
-
-		path = buf;
+		path = base_path;
 		return true;
 
 	case PATH_CONFIG:
 	case PATH_SAVE:
-		sprintf(buf, "%s/Library/Application Support/dhewm3", [NSHomeDirectory() cString]);
-		path = buf;
+		path = save_path;
 		return true;
 
 	case PATH_EXE:
-		strncpy(buf, [ [ [ NSBundle mainBundle ] bundlePath ] cString ], MAXPATHLEN);
-		path = buf;
+		path = exe_path;
 		return true;
 	}
 
@@ -86,10 +93,16 @@ returns in megabytes
 ================
 */
 int Sys_GetSystemRam( void ) {
-	SInt32 ramSize;
-
-	if ( Gestalt( gestaltPhysicalRAMSize, &ramSize ) == noErr ) {
-		return ramSize / (1024*1024);
+	// from https://discussions.apple.com/thread/1775836?answerId=8396559022#8396559022
+	// should work (at least) from the Mac OSX 10.2.8 SDK on
+	int mib[2];
+	uint64_t memsize;
+	size_t len;
+	mib[0] = CTL_HW;
+	mib[1] = HW_MEMSIZE; /* uint64_t: physical ram size */
+	len = sizeof(memsize);
+	if(sysctl(mib, 2, &memsize, &len, NULL, 0) == 0) {
+		return (int)(memsize / (1024*1024));
 	}
 	else
 		return 1024;
@@ -180,6 +193,19 @@ int SDL_main( int argc, char *argv[] ) {
 	if (![[NSFileManager defaultManager] changeCurrentDirectoryPath:[[NSBundle mainBundle] resourcePath]])
 		Sys_Error("Could not access application resources");
 
+	// DG: set exe_path so Posix_InitSignalHandlers() can call Posix_GetExePath()
+	SDL_strlcpy(exe_path, [ [ [ NSBundle mainBundle ] bundlePath ] cString ], sizeof(exe_path));
+	// same for save_path for Posix_GetSavePath()
+	D3_snprintfC99(save_path, sizeof(save_path), "%s/Library/Application Support/dhewm3", [NSHomeDirectory() cString]);
+	// and preinitializing basepath is easy enough so do that as well
+	{
+		char* snap;
+		SDL_strlcpy(base_path, exe_path, sizeof(base_path));
+		snap = strrchr(base_path, '/');
+		if (snap)
+			*snap = '\0';
+	}
+
 	Posix_InitSignalHandlers(); // DG: added signal handlers for POSIX platforms
 
 	if (argc > 1)
diff --git a/neo/sys/osx/SDLMain.m b/neo/sys/osx/SDLMain.m
index 2c148e2..5599789 100644
--- a/neo/sys/osx/SDLMain.m
+++ b/neo/sys/osx/SDLMain.m
@@ -5,12 +5,17 @@
     Feel free to customize this file to suit your needs
 */
 
+#if defined(__clang__)
+#pragma clang diagnostic push
+#pragma clang diagnostic ignored "-Wdeprecated-declarations"
+#endif
+
 #include "SDL.h"
 #include "SDLMain.h"
 #include <sys/param.h> /* for MAXPATHLEN */
 #include <unistd.h>
 
-/* For some reaon, Apple removed setAppleMenu from the headers in 10.4,
+/* For some reason, Apple removed setAppleMenu from the headers in 10.4,
  but the method still is there and works. To avoid warnings, we declare
  it ourselves here. */
 @interface NSApplication(SDL_Missing_Methods)
@@ -221,7 +226,7 @@ static void CustomApplicationMain (int argc, char **argv)
 
     /* Create SDLMain and make it the app delegate */
     sdlMain = [[SDLMain alloc] init];
-    [NSApp setDelegate:sdlMain];
+    [NSApp setDelegate:(id<NSApplicationDelegate>)sdlMain];
 
     /* Start the main event loop */
     [NSApp run];
@@ -378,3 +383,7 @@ int main (int argc, char **argv)
 #endif
     return 0;
 }
+
+#if defined(__clang__)
+#pragma clang diagnostic pop
+#endif
diff --git a/neo/sys/platform.h b/neo/sys/platform.h
index 4e78868..e4ae34c 100644
--- a/neo/sys/platform.h
+++ b/neo/sys/platform.h
@@ -40,7 +40,7 @@ If you have questions concerning this license or the applicable additional terms
 ===============================================================================
 */
 
-// Win32
+// AROS
 #if defined(__AROS__)
 
 #define _alloca						alloca
@@ -200,6 +200,18 @@ If you have questions concerning this license or the applicable additional terms
 #undef FindText								// stupid namespace poluting Microsoft monkeys
 #endif
 
+// Apple legacy
+#ifdef __APPLE__
+#include <Availability.h>
+#ifdef __MAC_OS_X_VERSION_MIN_REQUIRED
+#if __MAC_OS_X_VERSION_MIN_REQUIRED == 1040
+#define OSX_TIGER
+#elif __MAC_OS_X_VERSION_MIN_REQUIRED < 1060
+#define OSX_LEOPARD
+#endif
+#endif
+#endif
+
 #define ID_TIME_T time_t
 
 typedef unsigned char			byte;		// 8 bits
diff --git a/neo/sys/posix/posix_main.cpp b/neo/sys/posix/posix_main.cpp
index 0454020..ee975d3 100644
--- a/neo/sys/posix/posix_main.cpp
+++ b/neo/sys/posix/posix_main.cpp
@@ -49,6 +49,8 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "sys/posix/posix_public.h"
 
+#include <SDL.h> // clipboard
+
 #define					COMMAND_HISTORY 64
 
 static int				input_hide = 0;
@@ -76,6 +78,15 @@ idCVar com_pid( "com_pid", "0", CVAR_INTEGER | CVAR_INIT | CVAR_SYSTEM, "process
 static int set_exit = 0;
 static char exit_spawn[ 1024 ] = { 0 };
 
+static FILE* consoleLog = NULL;
+void Sys_VPrintf(const char *msg, va_list arg);
+
+#ifdef snprintf
+  // I actually wanna use real snprintf here, not idStr:snPrintf(),
+  // so get rid of the use_idStr_snPrintf #define
+  #undef snprintf
+#endif
+
 /*
 ================
 Posix_Exit
@@ -93,6 +104,12 @@ void Posix_Exit(int ret) {
 	if ( exit_spawn[0] ) {
 		Sys_DoStartProcess( exit_spawn, false );
 	}
+
+	if(consoleLog != NULL) {
+		fclose(consoleLog);
+		consoleLog = NULL;
+	}
+
 	// in case of signal, handler tries a common->Quit
 	// we use set_exit to maintain a correct exit code
 	if ( set_exit ) {
@@ -244,6 +261,9 @@ Sys_Init
 =================
 */
 void Sys_Init( void ) {
+	if(consoleLog != NULL)
+		common->Printf("Logging console output to %s/dhewm3log.txt\n", Posix_GetSavePath());
+
 	Posix_InitConsoleInput();
 	com_pid.SetInteger( getpid() );
 	common->Printf( "pid: %d\n", com_pid.GetInteger() );
@@ -331,12 +351,28 @@ ID_TIME_T Sys_FileTimeStamp(FILE * fp) {
 }
 
 char *Sys_GetClipboardData(void) {
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	return SDL_GetClipboardText();
+#else
 	Sys_Printf( "TODO: Sys_GetClipboardData\n" );
 	return NULL;
+#endif
+}
+
+void Sys_FreeClipboardData( char* data ) {
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	SDL_free( data );
+#else
+	assert( 0 && "why is this called, Sys_GetClipboardData() isn't implemented for SDL1.2" );
+#endif
 }
 
 void Sys_SetClipboardData( const char *string ) {
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	SDL_SetClipboardText( string );
+#else
 	Sys_Printf( "TODO: Sys_SetClipboardData\n" );
+#endif
 }
 
 /*
@@ -383,12 +419,100 @@ int Sys_GetDriveFreeSpace( const char *path ) {
 static const int   crashSigs[]     = {  SIGILL,   SIGABRT,   SIGFPE,   SIGSEGV };
 static const char* crashSigNames[] = { "SIGILL", "SIGABRT", "SIGFPE", "SIGSEGV" };
 
-#if ( defined(__linux__) && defined(__GLIBC__) ) || defined(__FreeBSD__) || defined(__APPLE__)
-  // TODO: https://github.com/ianlancetaylor/libbacktrace looks interesting and also supports windows apparently
+#if ( defined(__linux__) && defined(__GLIBC__) ) || defined(__FreeBSD__) || (defined(__APPLE__) && !defined(OSX_TIGER))
   #define D3_HAVE_BACKTRACE
   #include <execinfo.h>
 #endif
 
+// unlike Sys_Printf() this doesn't call tty_Hide(); and tty_Show();
+// to minimize interaction with broken dhewm3 state
+// (but unlike regular printf() it'll also write to dhewm3log.txt)
+static void CrashPrintf(const char* msg, ...)
+{
+	va_list argptr;
+	va_start( argptr, msg );
+	Sys_VPrintf( msg, argptr );
+	va_end( argptr );
+}
+
+#ifdef D3_HAVE_LIBBACKTRACE
+// non-ancient versions of GCC and clang include libbacktrace
+// for ancient versions it can be built from https://github.com/ianlancetaylor/libbacktrace
+#include <backtrace.h>
+#include <cxxabi.h> // for demangling C++ symbols
+
+static struct backtrace_state *bt_state = NULL;
+
+static void bt_error_callback( void *data, const char *msg, int errnum )
+{
+	CrashPrintf("libbacktrace ERROR: %d - %s\n", errnum, msg);
+}
+
+static void bt_syminfo_callback( void *data, uintptr_t pc, const char *symname,
+								 uintptr_t symval, uintptr_t symsize )
+{
+	if (symname != NULL) {
+		int status;
+		// FIXME: sucks that __cxa_demangle() insists on using malloc().. but so does printf()
+		char* name = abi::__cxa_demangle(symname, NULL, NULL, &status);
+		if (name != NULL) {
+			symname = name;
+		}
+		CrashPrintf("  %zu %s\n", pc, symname);
+		free(name);
+	} else {
+		CrashPrintf("  %zu (unknown symbol)\n", pc);
+	}
+}
+
+static int bt_pcinfo_callback( void *data, uintptr_t pc, const char *filename, int lineno, const char *function )
+{
+	if (data != NULL) {
+		int* hadInfo = (int*)data;
+		*hadInfo = (function != NULL);
+	}
+
+	if (function != NULL) {
+		int status;
+		// FIXME: sucks that __cxa_demangle() insists on using malloc()..
+		char* name = abi::__cxa_demangle(function, NULL, NULL, &status);
+		if (name != NULL) {
+			function = name;
+		}
+
+		const char* fileNameNeo = strstr(filename, "/neo/");
+		if (fileNameNeo != NULL) {
+			filename = fileNameNeo+1; // I want "neo/bla/blub.cpp:42"
+		}
+		CrashPrintf("  %zu %s:%d %s\n", pc, filename, lineno, function);
+		free(name);
+	}
+
+	return 0;
+}
+
+static void bt_error_dummy( void *data, const char *msg, int errnum )
+{
+	//CrashPrintf("ERROR-DUMMY: %d - %s\n", errnum, msg);
+}
+
+static int bt_simple_callback(void *data, uintptr_t pc)
+{
+	int pcInfoWorked = 0;
+	// if this fails, the executable doesn't have debug info, that's ok (=> use bt_error_dummy())
+	backtrace_pcinfo(bt_state, pc, bt_pcinfo_callback, bt_error_dummy, &pcInfoWorked);
+	if (!pcInfoWorked) { // no debug info? use normal symbols instead
+		// yes, it would be easier to call backtrace_syminfo() in bt_pcinfo_callback() if function == NULL,
+		// but some libbacktrace versions (e.g. in Ubuntu 18.04's g++-7) don't call bt_pcinfo_callback
+		// at all if no debug info was available - which is also the reason backtrace_full() can't be used..
+		backtrace_syminfo(bt_state, pc, bt_syminfo_callback, bt_error_callback, NULL);
+	}
+
+	return 0;
+}
+
+#endif
+
 static void signalhandlerCrash(int sig)
 {
 	const char* name = "";
@@ -400,29 +524,41 @@ static void signalhandlerCrash(int sig)
 	// TODO: should probably use a custom print function around write(STDERR_FILENO, ...)
 	//       because printf() could allocate which is not good if processes state is fscked
 	//       (could use backtrace_symbols_fd() then)
-	printf("Looks like %s crashed with signal %s (%d) - sorry!\n", ENGINE_VERSION, name, sig);
+	CrashPrintf("\n\nLooks like %s crashed with signal %s (%d) - sorry!\n", ENGINE_VERSION, name, sig);
 
-#ifdef D3_HAVE_BACKTRACE
+#ifdef D3_HAVE_LIBBACKTRACE
+	if (bt_state != NULL) {
+		int skip = 1; // skip this function in backtrace
+		backtrace_simple(bt_state, skip, bt_simple_callback, bt_error_callback, NULL);
+	} else {
+		CrashPrintf("(No backtrace because libbacktrace state is NULL)\n");
+	}
+#elif defined(D3_HAVE_BACKTRACE)
 	// this is partly based on Yamagi Quake II code
 	void* array[128];
 	int size = backtrace(array, sizeof(array)/sizeof(array[0]));
 	char** strings = backtrace_symbols(array, size);
 
-	printf("\nBacktrace:\n");
+	CrashPrintf("\nBacktrace:\n");
 
 	for(int i = 0; i < size; i++) {
-		printf("  %s\n", strings[i]);
+		CrashPrintf("  %s\n", strings[i]);
 	}
 
-	printf("\n");
+	CrashPrintf("\n(Sorry it's not overly useful, build with libbacktrace support to get function names)\n");
 
 	free(strings);
 
 #else
-	printf("(No Backtrace on this platform)\n");
+	CrashPrintf("(No Backtrace on this platform)\n");
 #endif
 
 	fflush(stdout);
+	if(consoleLog != NULL) {
+		fflush(consoleLog);
+		// TODO: fclose(consoleLog); ?
+		//       consoleLog = NULL;
+	}
 
 	raise(sig); // pass it on to system
 }
@@ -463,8 +599,41 @@ static void installSigHandler(int sig, int flags, void (*handler)(int))
 	sigaction(sig, &sigact, NULL);
 }
 
+static bool dirExists(const char* dirPath)
+{
+	struct stat buf = {};
+	if(stat(dirPath, &buf) == 0) {
+		return (buf.st_mode & S_IFMT) == S_IFDIR;
+	}
+	return false;
+}
+
+static bool createPathRecursive(char* path)
+{
+	if(!dirExists(path)) {
+		char* lastDirSep = strrchr(path, '/');
+		if(lastDirSep != NULL) {
+			*lastDirSep = '\0'; // cut off last part of the path and try first with parent directory
+			bool ok = createPathRecursive(path);
+			*lastDirSep = '/'; // restore path
+			// if parent dir was successfully created (or already existed), create this dir
+			if(ok && mkdir(path, 0755) == 0) {
+				return true;
+			}
+		}
+		return false;
+	}
+	return true;
+}
+
 void Posix_InitSignalHandlers( void )
 {
+#ifdef D3_HAVE_LIBBACKTRACE
+	// can't use idStr here and thus can't use Sys_GetPath(PATH_EXE) => added Posix_GetExePath()
+	const char* exePath = Posix_GetExePath();
+	bt_state = backtrace_create_state(exePath[0] ? exePath : NULL, 0, bt_error_callback, NULL);
+#endif
+
 	for(int i=0; i<sizeof(crashSigs)/sizeof(crashSigs[0]); ++i)
 	{
 		installSigHandler(crashSigs[i], SA_RESTART|SA_RESETHAND, signalhandlerCrash);
@@ -472,6 +641,53 @@ void Posix_InitSignalHandlers( void )
 
 	installSigHandler(SIGTTIN, 0, signalhandlerConsoleStuff);
 	installSigHandler(SIGTTOU, 0, signalhandlerConsoleStuff);
+
+	// this is also a good place to open dhewm3log.txt for Sys_VPrintf()
+
+	const char* savePath = Posix_GetSavePath();
+	size_t savePathLen = strlen(savePath);
+	if(savePathLen > 0 && savePathLen < PATH_MAX) {
+		char logPath[PATH_MAX] = {};
+		if(savePath[savePathLen-1] == '/') {
+			--savePathLen;
+		}
+		memcpy(logPath, savePath, savePathLen);
+		logPath[savePathLen] = '\0';
+		if(!createPathRecursive(logPath)) {
+			printf("WARNING: Couldn't create save path '%s'!\n", logPath);
+			return;
+		}
+		char logFileName[PATH_MAX] = {};
+		int fullLogLen = snprintf(logFileName, sizeof(logFileName), "%s/dhewm3log.txt", logPath);
+		// cast to size_t which is unsigned and would get really big if fullLogLen < 0 (=> error in snprintf())
+		if((size_t)fullLogLen >= sizeof(logFileName)) {
+			printf("WARNING: Couldn't create dhewm3log.txt at '%s' because its length would be '%d' which is > PATH_MAX (%zd) or < 0!\n",
+			       logPath, fullLogLen, (size_t)PATH_MAX);
+			return;
+		}
+		struct stat buf;
+		if(stat(logFileName, &buf) == 0) {
+			// logfile exists, rename to dhewm3log-old.txt
+			char oldLogFileName[PATH_MAX] = {};
+			if((size_t)snprintf(oldLogFileName, sizeof(oldLogFileName), "%s/dhewm3log-old.txt", logPath) < sizeof(logFileName))
+			{
+				rename(logFileName, oldLogFileName);
+			}
+		}
+		consoleLog = fopen(logFileName, "w");
+		if(consoleLog == NULL) {
+			printf("WARNING: Couldn't open/create '%s', error was: %d (%s)\n", logFileName, errno, strerror(errno));
+		} else {
+			time_t tt = time(NULL);
+			const struct tm* tms = localtime(&tt);
+			char timeStr[64] = {};
+			strftime(timeStr, sizeof(timeStr), "%F %H:%M:%S", tms);
+			fprintf(consoleLog, "Opened this log at %s\n", timeStr);
+		}
+
+	} else {
+		printf("WARNING: Posix_GetSavePath() returned path with invalid length '%zd'!\n", savePathLen);
+	}
 }
 
 // ----------- signal handling stuff done ------------
@@ -897,38 +1113,48 @@ low level output
 ===============
 */
 
+void Sys_VPrintf( const char *msg, va_list arg ) {
+	// gonna use arg twice, so copy it
+	va_list arg2;
+	va_copy(arg2, arg);
+
+	// first print to stdout()
+	vprintf(msg, arg2);
+
+	va_end(arg2); // arg2 is not needed anymore
+
+	// then print to the log, if any
+	if(consoleLog != NULL)
+	{
+		vfprintf(consoleLog, msg, arg);
+	}
+}
+
 void Sys_DebugPrintf( const char *fmt, ... ) {
 	va_list argptr;
 
 	tty_Hide();
 	va_start( argptr, fmt );
-	vprintf( fmt, argptr );
+	Sys_VPrintf( fmt, argptr );
 	va_end( argptr );
 	tty_Show();
 }
 
 void Sys_DebugVPrintf( const char *fmt, va_list arg ) {
 	tty_Hide();
-	vprintf( fmt, arg );
+	Sys_VPrintf( fmt, arg );
 	tty_Show();
 }
 
 void Sys_Printf(const char *msg, ...) {
 	va_list argptr;
-
 	tty_Hide();
 	va_start( argptr, msg );
-	vprintf( msg, argptr );
+	Sys_VPrintf( msg, argptr );
 	va_end( argptr );
 	tty_Show();
 }
 
-void Sys_VPrintf(const char *msg, va_list arg) {
-	tty_Hide();
-	vprintf(msg, arg);
-	tty_Show();
-}
-
 /*
 ================
 Sys_Error
diff --git a/neo/sys/posix/posix_net.cpp b/neo/sys/posix/posix_net.cpp
index c7c012d..746a945 100644
--- a/neo/sys/posix/posix_net.cpp
+++ b/neo/sys/posix/posix_net.cpp
@@ -109,8 +109,7 @@ ExtractPort
 */
 static bool ExtractPort( const char *src, char *buf, int bufsize, int *port ) {
 	char *p;
-	strncpy( buf, src, bufsize );
-	p = buf; p += Min( bufsize - 1, (int)strlen( src ) ); *p = '\0';
+	idStr::Copynz( buf, src, bufsize );
 	p = strchr( buf, ':' );
 	if ( !p ) {
 		return false;
diff --git a/neo/sys/posix/posix_public.h b/neo/sys/posix/posix_public.h
index 6a5ff88..0a33f89 100644
--- a/neo/sys/posix/posix_public.h
+++ b/neo/sys/posix/posix_public.h
@@ -35,11 +35,14 @@ If you have questions concerning this license or the applicable additional terms
 
 const char*	Posix_Cwd( void );
 
+const char* Posix_GetExePath();
+const char* Posix_GetSavePath();
+
 void		Posix_Exit( int ret );
 void		Posix_SetExit(int ret); // override the exit code
 void		Posix_SetExitSpawn( const char *exeName ); // set the process to be spawned when we quit
 
-void		Posix_InitSignalHandlers( void );
+void		Posix_InitSignalHandlers( void ); // also opens/creates dhewm3.log
 void		Posix_InitConsoleInput( void );
 void		Posix_Shutdown( void );
 
diff --git a/neo/sys/stub/openal_stub.cpp b/neo/sys/stub/openal_stub.cpp
index 9496bf2..31099da 100644
--- a/neo/sys/stub/openal_stub.cpp
+++ b/neo/sys/stub/openal_stub.cpp
@@ -143,5 +143,6 @@ AL_API void AL_APIENTRY alSourcef( ALuint sid, ALenum param, ALfloat value ) { }
 AL_API void AL_APIENTRY alSourceUnqueueBuffers( ALuint sid, ALsizei numEntries, ALuint *bids ) { }
 
 AL_API void AL_APIENTRY alSourcePlay( ALuint sid ) { }
+AL_API void AL_APIENTRY alSourcePause( ALuint source ) {}
 
 } // extern "C"
diff --git a/neo/sys/stub/stub_gl.cpp b/neo/sys/stub/stub_gl.cpp
index 55a9b0d..af57108 100644
--- a/neo/sys/stub/stub_gl.cpp
+++ b/neo/sys/stub/stub_gl.cpp
@@ -29,6 +29,13 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "renderer/tr_local.h"
 
+#ifdef _MSC_VER
+#pragma warning(push)
+// for each gl function we get an inconsistent dll linkage warning, because SDL_OpenGL.h says they're dllimport
+// showing one warning is enough and it doesn't matter anyway (these stubs are for the dedicated server)
+#pragma warning( once : 4273 )
+#endif
+
 void APIENTRY glAccum(GLenum op, GLfloat value){};
 void APIENTRY glAlphaFunc(GLenum func, GLclampf ref){};
 GLboolean APIENTRY glAreTexturesResident(GLsizei n, const GLuint *textures, GLboolean *residences){ return false; };
@@ -37,6 +44,7 @@ void APIENTRY glBegin(GLenum mode){};
 void APIENTRY glBindTexture(GLenum target, GLuint texture){};
 void APIENTRY glBitmap(GLsizei width, GLsizei height, GLfloat xorig, GLfloat yorig, GLfloat xmove, GLfloat ymove, const GLubyte *bitmap){};
 void APIENTRY glBlendFunc(GLenum sfactor, GLenum dfactor){};
+void APIENTRY glBlendEquation(GLenum mode){};
 void APIENTRY glCallList(GLuint list){};
 void APIENTRY glCallLists(GLsizei n, GLenum type, const GLvoid *lists){};
 void APIENTRY glClear(GLbitfield mask){};
@@ -382,9 +390,14 @@ GLExtension_t GLimp_ExtensionPointer( const char *a) { return StubFunction; };
 
 bool GLimp_Init(glimpParms_t a) {return true;};
 void GLimp_SetGamma(unsigned short*a, unsigned short*b, unsigned short*c) {};
+void GLimp_ResetGamma() {}
 bool GLimp_SetScreenParms(glimpParms_t parms) { return true; };
 void GLimp_Shutdown() {};
 void GLimp_SwapBuffers() {};
 void GLimp_ActivateContext() {};
 void GLimp_DeactivateContext() {};
 void GLimp_GrabInput(int flags) {};
+
+#ifdef _MSC_VER
+#pragma warning(pop)
+#endif
diff --git a/neo/sys/sys_public.h b/neo/sys/sys_public.h
index fbac1a8..bc98583 100644
--- a/neo/sys/sys_public.h
+++ b/neo/sys/sys_public.h
@@ -57,7 +57,8 @@ typedef enum {
 	SE_NONE,				// evTime is still valid
 	SE_KEY,					// evValue is a key code, evValue2 is the down flag
 	SE_CHAR,				// evValue is an ascii char
-	SE_MOUSE,				// evValue and evValue2 are reletive signed x / y moves
+	SE_MOUSE,				// evValue and evValue2 are relative signed x / y moves
+	SE_MOUSE_ABS,			// evValue and evValue2 are absolute x / y coordinates in the window
 	SE_JOYSTICK_AXIS,		// evValue is an axis number and evValue2 is the current state (-127 to 127)
 	SE_CONSOLE				// evPtr is a char*, from typing something at a non-game console
 } sysEventType_t;
@@ -101,6 +102,7 @@ void			Sys_Quit( void );
 
 // note that this isn't journaled...
 char *			Sys_GetClipboardData( void );
+void			Sys_FreeClipboardData( char* data );
 void			Sys_SetClipboardData( const char *string );
 
 // will go to the various text consoles
@@ -165,6 +167,17 @@ unsigned char	Sys_GetConsoleKey( bool shifted );
 // does nothing on win32, as SE_KEY == SE_CHAR there
 // on other OSes, consider the keyboard mapping
 unsigned char	Sys_MapCharForKey( int key );
+// for keynums between K_FIRST_SCANCODE and K_LAST_SCANCODE
+// returns e.g. "SC_A" for K_SC_A
+const char* Sys_GetScancodeName( int key );
+// returns localized name of the key (between K_FIRST_SCANCODE and K_LAST_SCANCODE),
+// regarding the current keyboard layout - if that name is in ASCII or corresponds
+// to a "High-ASCII" char supported by Doom3.
+// Otherwise return same name as Sys_GetScancodeName()
+// !! Returned string is only valid until next call to this function !!
+const char* Sys_GetLocalizedScancodeName( int key );
+// returns keyNum_t (K_SC_* constant) for given scancode name (like "SC_A")
+int Sys_GetKeynumForScancodeName( const char* name );
 
 // keyboard input polling
 int				Sys_PollKeyboardInputEvents( void );
@@ -308,6 +321,8 @@ const char *		Sys_GetThreadName( int *index = 0 );
 extern void Sys_InitThreads();
 extern void Sys_ShutdownThreads();
 
+bool Sys_IsMainThread();
+
 const int MAX_CRITICAL_SECTIONS		= 5;
 
 enum {
diff --git a/neo/sys/threads.cpp b/neo/sys/threads.cpp
index a0fe7c8..0c1f0e8 100644
--- a/neo/sys/threads.cpp
+++ b/neo/sys/threads.cpp
@@ -44,6 +44,9 @@ static bool			waiting[MAX_TRIGGER_EVENTS] = { };
 static xthreadInfo	*thread[MAX_THREADS] = { };
 static size_t		thread_count = 0;
 
+static bool mainThreadIDset = false;
+static SDL_threadID mainThreadID = -1;
+
 /*
 ==============
 Sys_Sleep
@@ -68,6 +71,9 @@ Sys_InitThreads
 ==================
 */
 void Sys_InitThreads() {
+	mainThreadID = SDL_ThreadID();
+	mainThreadIDset = true;
+
 	// critical sections
 	for (int i = 0; i < MAX_CRITICAL_SECTIONS; i++) {
 		mutex[i] = SDL_CreateMutex();
@@ -314,3 +320,18 @@ const char *Sys_GetThreadName(int *index) {
 
 	return "main";
 }
+
+
+/*
+==================
+Sys_IsMainThread
+returns true if the current thread is the main thread
+==================
+*/
+bool Sys_IsMainThread() {
+	if ( mainThreadIDset )
+		return SDL_ThreadID() == mainThreadID;
+	// if this is called before mainThreadID is set, we haven't created
+	// any threads yet so it should be the main thread
+	return true;
+}
diff --git a/neo/tools/af/DialogAFBody.cpp b/neo/tools/af/DialogAFBody.cpp
index a15156a..8504e10 100644
--- a/neo/tools/af/DialogAFBody.cpp
+++ b/neo/tools/af/DialogAFBody.cpp
@@ -707,7 +707,7 @@ BOOL DialogAFBody::OnInitDialog()  {
 DialogAFBody::OnToolHitTest
 ================
 */
-int DialogAFBody::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFBody::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFBody.h b/neo/tools/af/DialogAFBody.h
index 8e93da9..a438911 100644
--- a/neo/tools/af/DialogAFBody.h
+++ b/neo/tools/af/DialogAFBody.h
@@ -49,7 +49,7 @@ public:
 protected:
 	virtual BOOL		OnInitDialog();
 	virtual void		DoDataExchange( CDataExchange* pDX );    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnShowWindow( BOOL bShow, UINT nStatus );
 	afx_msg void		OnCbnSelchangeComboBodies();
diff --git a/neo/tools/af/DialogAFConstraint.cpp b/neo/tools/af/DialogAFConstraint.cpp
index 67c4122..693c1a9 100644
--- a/neo/tools/af/DialogAFConstraint.cpp
+++ b/neo/tools/af/DialogAFConstraint.cpp
@@ -414,7 +414,7 @@ BOOL DialogAFConstraint::OnInitDialog()  {
 DialogAFConstraint::OnToolHitTest
 ================
 */
-int DialogAFConstraint::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraint::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraint.h b/neo/tools/af/DialogAFConstraint.h
index 634d5e9..c3a1c90 100644
--- a/neo/tools/af/DialogAFConstraint.h
+++ b/neo/tools/af/DialogAFConstraint.h
@@ -54,7 +54,7 @@ public:
 protected:
 	virtual BOOL		OnInitDialog();
 	virtual void		DoDataExchange( CDataExchange* pDX );    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnShowWindow( BOOL bShow, UINT nStatus );
 	afx_msg void		OnCbnSelchangeComboConstraints();
diff --git a/neo/tools/af/DialogAFConstraintBallAndSocket.cpp b/neo/tools/af/DialogAFConstraintBallAndSocket.cpp
index 1568499..d962e5d 100644
--- a/neo/tools/af/DialogAFConstraintBallAndSocket.cpp
+++ b/neo/tools/af/DialogAFConstraintBallAndSocket.cpp
@@ -325,7 +325,7 @@ void DialogAFConstraintBallAndSocket::UpdateFile( void ) {
 DialogAFConstraintBallAndSocket::OnToolHitTest
 ================
 */
-int DialogAFConstraintBallAndSocket::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraintBallAndSocket::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraintBallAndSocket.h b/neo/tools/af/DialogAFConstraintBallAndSocket.h
index 35a9a27..f19b07b 100644
--- a/neo/tools/af/DialogAFConstraintBallAndSocket.h
+++ b/neo/tools/af/DialogAFConstraintBallAndSocket.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnBnClickedRadioAnchorJoint();
 	afx_msg void		OnBnClickedRadioAnchorCoordinates();
diff --git a/neo/tools/af/DialogAFConstraintFixed.cpp b/neo/tools/af/DialogAFConstraintFixed.cpp
index 849716c..00e9e49 100644
--- a/neo/tools/af/DialogAFConstraintFixed.cpp
+++ b/neo/tools/af/DialogAFConstraintFixed.cpp
@@ -150,7 +150,7 @@ void DialogAFConstraintFixed::UpdateFile( void ) {
 DialogAFConstraintFixed::OnToolHitTest
 ================
 */
-int DialogAFConstraintFixed::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraintFixed::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraintFixed.h b/neo/tools/af/DialogAFConstraintFixed.h
index 64af6f3..fcb7997 100644
--- a/neo/tools/af/DialogAFConstraintFixed.h
+++ b/neo/tools/af/DialogAFConstraintFixed.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 
 	DECLARE_MESSAGE_MAP()
diff --git a/neo/tools/af/DialogAFConstraintHinge.cpp b/neo/tools/af/DialogAFConstraintHinge.cpp
index dd0031a..67a5187 100644
--- a/neo/tools/af/DialogAFConstraintHinge.cpp
+++ b/neo/tools/af/DialogAFConstraintHinge.cpp
@@ -273,7 +273,7 @@ void DialogAFConstraintHinge::UpdateFile( void ) {
 DialogAFConstraintHinge::OnToolHitTest
 ================
 */
-int DialogAFConstraintHinge::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraintHinge::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraintHinge.h b/neo/tools/af/DialogAFConstraintHinge.h
index b98bd60..9e0c4d4 100644
--- a/neo/tools/af/DialogAFConstraintHinge.h
+++ b/neo/tools/af/DialogAFConstraintHinge.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnBnClickedRadioAnchorJoint();
 	afx_msg void		OnBnClickedRadioAnchorCoordinates();
diff --git a/neo/tools/af/DialogAFConstraintSlider.cpp b/neo/tools/af/DialogAFConstraintSlider.cpp
index 77e3186..06fa36b 100644
--- a/neo/tools/af/DialogAFConstraintSlider.cpp
+++ b/neo/tools/af/DialogAFConstraintSlider.cpp
@@ -210,7 +210,7 @@ void DialogAFConstraintSlider::UpdateFile( void ) {
 DialogAFConstraintSlider::OnToolHitTest
 ================
 */
-int DialogAFConstraintSlider::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraintSlider::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraintSlider.h b/neo/tools/af/DialogAFConstraintSlider.h
index cbbdf11..d5cf53b 100644
--- a/neo/tools/af/DialogAFConstraintSlider.h
+++ b/neo/tools/af/DialogAFConstraintSlider.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnBnClickedRadioSliderAxisBone();
 	afx_msg void		OnBnClickedRadioSliderAxisAngles();
diff --git a/neo/tools/af/DialogAFConstraintSpring.cpp b/neo/tools/af/DialogAFConstraintSpring.cpp
index b3428fd..ccb18c9 100644
--- a/neo/tools/af/DialogAFConstraintSpring.cpp
+++ b/neo/tools/af/DialogAFConstraintSpring.cpp
@@ -293,7 +293,7 @@ void DialogAFConstraintSpring::UpdateFile( void ) {
 DialogAFConstraintSpring::OnToolHitTest
 ================
 */
-int DialogAFConstraintSpring::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraintSpring::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraintSpring.h b/neo/tools/af/DialogAFConstraintSpring.h
index 31e4871..f9c7f8c 100644
--- a/neo/tools/af/DialogAFConstraintSpring.h
+++ b/neo/tools/af/DialogAFConstraintSpring.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnBnClickedRadioAnchorJoint();
 	afx_msg void		OnBnClickedRadioAnchorCoordinates();
diff --git a/neo/tools/af/DialogAFConstraintUniversal.cpp b/neo/tools/af/DialogAFConstraintUniversal.cpp
index 613b711..bcd186b 100644
--- a/neo/tools/af/DialogAFConstraintUniversal.cpp
+++ b/neo/tools/af/DialogAFConstraintUniversal.cpp
@@ -369,7 +369,7 @@ void DialogAFConstraintUniversal::UpdateFile( void ) {
 DialogAFConstraintUniversal::OnToolHitTest
 ================
 */
-int DialogAFConstraintUniversal::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFConstraintUniversal::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFConstraintUniversal.h b/neo/tools/af/DialogAFConstraintUniversal.h
index 927f6cf..23c3e37 100644
--- a/neo/tools/af/DialogAFConstraintUniversal.h
+++ b/neo/tools/af/DialogAFConstraintUniversal.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnBnClickedRadioAnchorJoint();
 	afx_msg void		OnBnClickedRadioAnchorCoordinates();
diff --git a/neo/tools/af/DialogAFProperties.cpp b/neo/tools/af/DialogAFProperties.cpp
index be6232c..1bfd5e2 100644
--- a/neo/tools/af/DialogAFProperties.cpp
+++ b/neo/tools/af/DialogAFProperties.cpp
@@ -253,7 +253,7 @@ void DialogAFProperties::ClearFile( void ) {
 DialogAFProperties::OnToolHitTest
 ================
 */
-int DialogAFProperties::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFProperties::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFProperties.h b/neo/tools/af/DialogAFProperties.h
index 2abf7d3..8b48d05 100644
--- a/neo/tools/af/DialogAFProperties.h
+++ b/neo/tools/af/DialogAFProperties.h
@@ -46,7 +46,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange( CDataExchange* pDX );    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnEnChangeEditModel();
 	afx_msg void		OnEnChangeEditSkin();
diff --git a/neo/tools/af/DialogAFView.cpp b/neo/tools/af/DialogAFView.cpp
index 652a6f2..0ba1db3 100644
--- a/neo/tools/af/DialogAFView.cpp
+++ b/neo/tools/af/DialogAFView.cpp
@@ -150,7 +150,7 @@ void DialogAFView::DoDataExchange(CDataExchange* pDX) {
 DialogAFView::OnToolHitTest
 ================
 */
-int DialogAFView::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR DialogAFView::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CDialog::OnToolHitTest( point, pTI );
 	return DefaultOnToolHitTest( toolTips, this, point, pTI );
 }
diff --git a/neo/tools/af/DialogAFView.h b/neo/tools/af/DialogAFView.h
index da2f411..21ded2b 100644
--- a/neo/tools/af/DialogAFView.h
+++ b/neo/tools/af/DialogAFView.h
@@ -41,7 +41,7 @@ public:
 
 protected:
 	virtual void		DoDataExchange(CDataExchange* pDX);    // DDX/DDV support
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL		OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg void		OnBnClickedCheckViewBodies();
 	afx_msg void		OnBnClickedCheckViewBodynames();
diff --git a/neo/tools/comafx/CDIB.cpp b/neo/tools/comafx/CDIB.cpp
index 0bfd3b0..6ddd74a 100644
--- a/neo/tools/comafx/CDIB.cpp
+++ b/neo/tools/comafx/CDIB.cpp
@@ -340,8 +340,8 @@ void CDIB::ExpandBlt(int nXDest,int nYDest,int xRatio,int yRatio,CDIB& dibSrc,in
 
 void CDIB::Expand(int nXDest,int nYDest,int xRatio,int yRatio,CDIB& dibSrc,int xSrc,int ySrc,int  nSWidth,int nSHeight)
 {
-int xNum,yNum,xErr,yErr;
-int nDWidth,nDHeight;
+	int xNum,yNum,xErr,yErr;
+	int nDWidth,nDHeight;
 
 	nDWidth = nSWidth*xRatio;
 	nDHeight = nSHeight*yRatio;
@@ -354,8 +354,8 @@ int nDWidth,nDHeight;
 	xErr = nDWidth%xRatio;
 	yErr = nDHeight%yRatio;
 
-unsigned char *buffer,*srcPtr,*destPtr,*ptr;
-int i,j,k;
+	unsigned char *buffer,*srcPtr,*destPtr,*ptr;
+	int i,j,k = 0;
 
 	buffer = (unsigned char *)malloc(nDWidth+20);
 	if(!buffer) return;
@@ -687,7 +687,7 @@ void CDIB::GetPixel(UINT x,UINT y,int& pixel)
 
 BOOL CDIB::Make8Bit(CDIB& dib)
 {
-int nBits;
+	int nBits;
 	ASSERT(Width() == dib.Width());
 	ASSERT(Height() == dib.Height());
 	nBits = dib.GetBitCount();
@@ -708,7 +708,6 @@ int nBits;
 	default:
 		return FALSE;
 	}
-	return FALSE;
 }
 
 /*
@@ -813,7 +812,7 @@ unsigned char cols[256];
 
 int CDIB::ClosestColor(RGBQUAD *pRgb)
 {
-unsigned int dist=BIG_DISTANCE,i,d,c;
+unsigned int dist=BIG_DISTANCE,i,d,c = 0;
 RGBQUAD *pQuad=m_pRGB;
 unsigned int pSize=GetPaletteSize();
 	for(i=0; i < pSize;i++)
diff --git a/neo/tools/comafx/CPathTreeCtrl.cpp b/neo/tools/comafx/CPathTreeCtrl.cpp
index cb580a3..ca68bc7 100644
--- a/neo/tools/comafx/CPathTreeCtrl.cpp
+++ b/neo/tools/comafx/CPathTreeCtrl.cpp
@@ -241,7 +241,7 @@ END_MESSAGE_MAP()
 CPathTreeCtrl::OnToolHitTest
 ================
 */
-int CPathTreeCtrl::OnToolHitTest( CPoint point, TOOLINFO * pTI ) const {
+INT_PTR CPathTreeCtrl::OnToolHitTest( CPoint point, TOOLINFO * pTI ) const {
 	RECT rect;
 
 	UINT nFlags;
@@ -249,7 +249,7 @@ int CPathTreeCtrl::OnToolHitTest( CPoint point, TOOLINFO * pTI ) const {
 	if( nFlags & TVHT_ONITEM ) {
 		GetItemRect( hitem, &rect, TRUE );
 		pTI->hwnd = m_hWnd;
-		pTI->uId = (UINT)hitem;
+		pTI->uId = (UINT_PTR)hitem;
 		pTI->lpszText = LPSTR_TEXTCALLBACK;
 		pTI->rect = rect;
 		return pTI->uId;
@@ -272,7 +272,7 @@ BOOL CPathTreeCtrl::OnToolTipText( UINT id, NMHDR * pNMHDR, LRESULT * pResult )
 	*pResult = 0;
 
 	// Do not process the message from built in tooltip
-	if( nID == (UINT)m_hWnd &&
+	if( nID == (UINT_PTR)m_hWnd &&
 			(( pNMHDR->code == TTN_NEEDTEXTA && pTTTA->uFlags & TTF_IDISHWND ) ||
 			( pNMHDR->code == TTN_NEEDTEXTW && pTTTW->uFlags & TTF_IDISHWND ) ) ) {
 		return FALSE;
@@ -295,7 +295,7 @@ BOOL CPathTreeCtrl::OnToolTipText( UINT id, NMHDR * pNMHDR, LRESULT * pResult )
 	if( nFlags & TVHT_ONITEM ) {
 		// relay message to parent
 		pTTTA->hdr.hwndFrom = GetSafeHwnd();
-		pTTTA->hdr.idFrom = (UINT) hitem;
+		pTTTA->hdr.idFrom = (UINT_PTR) hitem;
 		if ( GetParent()->SendMessage( WM_NOTIFY, ( TTN_NEEDTEXT << 16 ) | GetDlgCtrlID(), (LPARAM)pTTTA ) == FALSE ) {
 			return FALSE;
 		}
diff --git a/neo/tools/comafx/CPathTreeCtrl.h b/neo/tools/comafx/CPathTreeCtrl.h
index 6441e45..1ec6439 100644
--- a/neo/tools/comafx/CPathTreeCtrl.h
+++ b/neo/tools/comafx/CPathTreeCtrl.h
@@ -84,7 +84,7 @@ public:
 
 protected:
 	virtual void		PreSubclassWindow();
-	virtual int			OnToolHitTest( CPoint point, TOOLINFO * pTI ) const;
+	virtual INT_PTR		OnToolHitTest( CPoint point, TOOLINFO * pTI ) const;
 	afx_msg BOOL		OnToolTipText( UINT id, NMHDR * pNMHDR, LRESULT * pResult );
 
 	DECLARE_MESSAGE_MAP()
diff --git a/neo/tools/comafx/CSyntaxRichEditCtrl.cpp b/neo/tools/comafx/CSyntaxRichEditCtrl.cpp
index 0315d69..4ede3d7 100644
--- a/neo/tools/comafx/CSyntaxRichEditCtrl.cpp
+++ b/neo/tools/comafx/CSyntaxRichEditCtrl.cpp
@@ -1361,7 +1361,7 @@ void CSyntaxRichEditCtrl::GoToLine( int line ) {
 CSyntaxRichEditCtrl::OnToolHitTest
 ================
 */
-int CSyntaxRichEditCtrl::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
+INT_PTR CSyntaxRichEditCtrl::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const {
 	CRichEditCtrl::OnToolHitTest( point, pTI );
 
 	pTI->hwnd = GetSafeHwnd();
@@ -1380,8 +1380,12 @@ CSyntaxRichEditCtrl::OnToolTipNotify
 ================
 */
 BOOL CSyntaxRichEditCtrl::OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult ) {
+
+#ifdef _UNICODE
 	TOOLTIPTEXTA* pTTTA = (TOOLTIPTEXTA*)pNMHDR;
+#else
 	TOOLTIPTEXTW* pTTTW = (TOOLTIPTEXTW*)pNMHDR;
+#endif
 
 	*pResult = 0;
 
diff --git a/neo/tools/comafx/CSyntaxRichEditCtrl.h b/neo/tools/comafx/CSyntaxRichEditCtrl.h
index 274b841..5a8ff53 100644
--- a/neo/tools/comafx/CSyntaxRichEditCtrl.h
+++ b/neo/tools/comafx/CSyntaxRichEditCtrl.h
@@ -128,7 +128,7 @@ public:
 	void					ReplaceText( int startCharIndex, int endCharIndex, const char *replace );
 
 protected:
-	virtual int				OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
+	virtual INT_PTR			OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	afx_msg BOOL			OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 	afx_msg UINT			OnGetDlgCode();
 	afx_msg void			OnChar( UINT nChar, UINT nRepCnt, UINT nFlags );
diff --git a/neo/tools/comafx/DialogColorPicker.cpp b/neo/tools/comafx/DialogColorPicker.cpp
index c92570b..c0b17c1 100644
--- a/neo/tools/comafx/DialogColorPicker.cpp
+++ b/neo/tools/comafx/DialogColorPicker.cpp
@@ -46,7 +46,7 @@ class CMyColorDialog : public CColorDialog
 	 // Construction
 public:
 	 CMyColorDialog( COLORREF clrInit = 0, DWORD dwFlags = 0, CWnd *pParentWnd = NULL );
-	 virtual int DoModal();
+	 virtual INT_PTR DoModal();
 
 protected:
 	 enum { NCUSTCOLORS = 16 };
@@ -120,7 +120,7 @@ CMyColorDialog::CMyColorDialog( COLORREF clrInit, DWORD dwFlags,
 	 m_cc.lpCustColors = c_CustColors;
 }
 
-int CMyColorDialog::DoModal() {
+INT_PTR CMyColorDialog::DoModal() {
 	 int code = CColorDialog::DoModal();
 	 SaveCustColors();
 	 return code;
@@ -912,7 +912,6 @@ void CDialogColorPicker::DrawLines(CDC *pDC)
 	pt[1] = PointOnLine(Vertex,Left,(color.g*GreenLen)/255,GreenLen);
 	pt[2] = PointOnLine(Vertex,Right,(color.b*BlueLen)/255,BlueLen);
 
-	COLORREF col = RGB(255,255,255);
 	CRect cr;
 
 	for(int i = 0; i < 3; i++ ) {
@@ -1281,7 +1280,7 @@ void CDialogColorPicker::OnSysColorChange()
 	LoadMappedBitmap(m_RgbBitmap,IDB_BITMAP_RGB,size);
 }
 
-void CDialogColorPicker::OnTimer(UINT nIDEvent)  {
+void CDialogColorPicker::OnTimer(UINT_PTR nIDEvent)  {
 	if ( UpdateParent ) {
 		UpdateParent( color.r, color.g, color.b, 1.0f );
 	}
diff --git a/neo/tools/comafx/DialogColorPicker.h b/neo/tools/comafx/DialogColorPicker.h
index f71527a..c18173b 100644
--- a/neo/tools/comafx/DialogColorPicker.h
+++ b/neo/tools/comafx/DialogColorPicker.h
@@ -112,7 +112,7 @@ protected:
 	afx_msg void	OnChangeEditSat();
 	afx_msg void	OnChangeEditVal();
 	afx_msg void	OnChangeEditOverbright();
-	afx_msg void	OnTimer(UINT nIDEvent);
+	afx_msg void	OnTimer(UINT_PTR nIDEvent);
 	afx_msg void	OnBtnColor();
 	//}}AFX_MSG
 	DECLARE_MESSAGE_MAP()
diff --git a/neo/tools/comafx/StdAfx.cpp b/neo/tools/comafx/StdAfx.cpp
index 5a9773c..361a93d 100644
--- a/neo/tools/comafx/StdAfx.cpp
+++ b/neo/tools/comafx/StdAfx.cpp
@@ -72,7 +72,7 @@ void InitAfx( void ) {
 DefaultOnToolHitTest
 ================
 */
-int DefaultOnToolHitTest( const toolTip_t *toolTips, const CDialog *dialog, CPoint point, TOOLINFO* pTI ) {
+INT_PTR DefaultOnToolHitTest( const toolTip_t *toolTips, const CDialog *dialog, CPoint point, TOOLINFO* pTI ) {
 	CWnd *wnd;
 	RECT clientRect, rect;
 
@@ -108,7 +108,7 @@ BOOL DefaultOnToolTipNotify( const toolTip_t *toolTips, UINT id, NMHDR *pNMHDR,
 
 	*pResult = 0;
 
-	UINT nID = pNMHDR->idFrom;
+	UINT_PTR nID = pNMHDR->idFrom;
 	if ( pTTTA->uFlags & TTF_IDISHWND ) {
 		// idFrom is actually the HWND of the tool
 		nID = ::GetDlgCtrlID((HWND)nID);
diff --git a/neo/tools/comafx/StdAfx.h b/neo/tools/comafx/StdAfx.h
index 6b336fa..253a988 100644
--- a/neo/tools/comafx/StdAfx.h
+++ b/neo/tools/comafx/StdAfx.h
@@ -50,7 +50,7 @@ typedef struct toolTip_s {
 	char *tip;
 } toolTip_t;
 
-int DefaultOnToolHitTest( const toolTip_t *toolTips, const CDialog *dialog, CPoint point, TOOLINFO* pTI );
+INT_PTR DefaultOnToolHitTest( const toolTip_t *toolTips, const CDialog *dialog, CPoint point, TOOLINFO* pTI );
 BOOL DefaultOnToolTipNotify( const toolTip_t *toolTips, UINT id, NMHDR *pNMHDR, LRESULT *pResult );
 
 // edit control
diff --git a/neo/tools/comafx/VectorCtl.cpp b/neo/tools/comafx/VectorCtl.cpp
index 613677c..dba1b0b 100644
--- a/neo/tools/comafx/VectorCtl.cpp
+++ b/neo/tools/comafx/VectorCtl.cpp
@@ -153,8 +153,8 @@ COLORREF CVectorCtl::CalcLight (double dx, double dy, double dz)
 {
 	double NL = dx * m_dVec[0] + dy * m_dVec[1] + dz * m_dVec[2],
 		   RV = 2.0 * NL,
-		   rx = m_dVec[0] - (dx * RV),
-		   ry = m_dVec[1] - (dy * RV),
+		   //rx = m_dVec[0] - (dx * RV),
+		   //ry = m_dVec[1] - (dy * RV),
 		   rz = m_dVec[2] - (dz * RV);
 
 	if (NL < 0.0)   // Diffuse coefficient
diff --git a/neo/tools/common/AlphaPopup.cpp b/neo/tools/common/AlphaPopup.cpp
index 68eb557..ee62047 100644
--- a/neo/tools/common/AlphaPopup.cpp
+++ b/neo/tools/common/AlphaPopup.cpp
@@ -91,7 +91,7 @@ LRESULT CALLBACK AlphaSlider_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARA
 			v = (float)((short)LOWORD(lParam)-5) / (float)(rClient.right - rClient.left - 10);
 			if ( v < 0 ) v = 0;
 			if ( v > 1.0f ) v = 1.0f;
-			SetWindowLong ( hwnd, GWL_USERDATA, MAKELONG(0x8000,(unsigned short)(255.0f * v)) );
+			SetWindowLongPtr ( hwnd, GWLP_USERDATA, MAKELONG(0x8000,(unsigned short)(255.0f * v)) );
 			InvalidateRect ( hwnd, NULL, FALSE );
 
 			SetCapture ( hwnd );
@@ -100,7 +100,7 @@ LRESULT CALLBACK AlphaSlider_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARA
 		}
 
 		case WM_MOUSEMOVE:
-			if ( LOWORD(GetWindowLong ( hwnd, GWL_USERDATA ) ) & 0x8000 )
+			if ( LOWORD( GetWindowLongPtr ( hwnd, GWLP_USERDATA ) ) & 0x8000 )
 			{
 				RECT  rClient;
 				float v;
@@ -109,13 +109,13 @@ LRESULT CALLBACK AlphaSlider_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARA
 				v = (float)((short)LOWORD(lParam)-5) / (float)(rClient.right - rClient.left - 10);
 				if ( v < 0 ) v = 0;
 				if ( v > 1.0f ) v = 1.0f;
-				SetWindowLong ( hwnd, GWL_USERDATA, MAKELONG(0x8000,(unsigned short)(255.0f * v)) );
+				SetWindowLongPtr ( hwnd, GWLP_USERDATA, MAKELONG(0x8000,(unsigned short)(255.0f * v)) );
 				InvalidateRect ( hwnd, NULL, FALSE );
 			}
 			break;
 
 		case WM_LBUTTONUP:
-			if ( LOWORD(GetWindowLong ( hwnd, GWL_USERDATA ) ) & 0x8000 )
+			if ( LOWORD( GetWindowLongPtr ( hwnd, GWLP_USERDATA ) ) & 0x8000 )
 			{
 				RECT  rClient;
 				float v;
@@ -124,7 +124,7 @@ LRESULT CALLBACK AlphaSlider_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARA
 				v = (float)((short)LOWORD(lParam)-5) / (float)(rClient.right - rClient.left - 10);
 				if ( v < 0 ) v = 0;
 				if ( v > 1.0f ) v = 1.0f;
-				SetWindowLong ( hwnd, GWL_USERDATA, MAKELONG(0x8000,(unsigned short)(255.0f * v)) );
+				SetWindowLongPtr ( hwnd, GWLP_USERDATA, MAKELONG(0x8000,(unsigned short)(255.0f * v)) );
 				InvalidateRect ( hwnd, NULL, FALSE );
 				ReleaseCapture ( );
 				SendMessage ( GetParent ( hwnd ), WM_COMMAND, MAKELONG(GetWindowLong (hwnd,GWL_ID),0), 0 );
@@ -172,7 +172,7 @@ LRESULT CALLBACK AlphaSlider_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARA
 
 			// Draw the thumb
 			RECT rThumb;
-			short s = HIWORD(GetWindowLong ( hwnd, GWL_USERDATA ));
+			short s = HIWORD(GetWindowLongPtr ( hwnd, GWLP_USERDATA ));
 			float thumb = (float)(short)s;
 			thumb /= 255.0f;
 			thumb *= (float)(rDraw.right-rDraw.left);
@@ -242,10 +242,10 @@ INT_PTR CALLBACK AlphaSelectDlg_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LP
 			color      = GetRValue(ColorButton_GetColor ((HWND)lParam));
 
 			// The lParam for the alpha select dialog is the window handle of the button pressed
-			SetWindowLong ( hwnd, GWL_USERDATA, lParam );
+			SetWindowLongPtr ( hwnd, GWLP_USERDATA, lParam );
 
 			// Subclass the alpha
-			SetWindowLong ( GetDlgItem ( hwnd, IDC_GUIED_ALPHASLIDER ), GWL_USERDATA, MAKELONG(0,color) );
+			SetWindowLongPtr ( GetDlgItem ( hwnd, IDC_GUIED_ALPHASLIDER ), GWLP_USERDATA, MAKELONG(0,color) );
 
 			// Numbers only on the edit box and start it with the current alpha value.
 			NumberEdit_Attach ( GetDlgItem ( hwnd, IDC_GUIED_ALPHA ) );
@@ -288,15 +288,15 @@ INT_PTR CALLBACK AlphaSelectDlg_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LP
 					}
 
 					// Set the current alpha value in the slider
-					SetWindowLong ( GetDlgItem ( hwnd, IDC_GUIED_ALPHASLIDER ), GWL_USERDATA, MAKELONG(0,(255.0f * value)) );
+					SetWindowLongPtr ( GetDlgItem ( hwnd, IDC_GUIED_ALPHASLIDER ), GWLP_USERDATA, MAKELONG(0,(255.0f * value)) );
 					break;
 				}
 
 				case IDC_GUIED_ALPHASLIDER:
 				case IDOK:
 				{
-					int color = (short)HIWORD(GetWindowLong ( GetDlgItem ( hwnd, IDC_GUIED_ALPHASLIDER ), GWL_USERDATA ));
-					ColorButton_SetColor ( (HWND)GetWindowLong ( hwnd, GWL_USERDATA ), RGB(color,color,color) );
+					int color = (short)HIWORD( GetWindowLongPtr ( GetDlgItem ( hwnd, IDC_GUIED_ALPHASLIDER ), GWLP_USERDATA ));
+					ColorButton_SetColor ( (HWND)GetWindowLongPtr ( hwnd, GWLP_USERDATA ), RGB(color,color,color) );
 					EndDialog ( hwnd, 0 );
 					break;
 				}
diff --git a/neo/tools/common/ColorButton.cpp b/neo/tools/common/ColorButton.cpp
index 044e31f..5f07d5d 100644
--- a/neo/tools/common/ColorButton.cpp
+++ b/neo/tools/common/ColorButton.cpp
@@ -47,7 +47,7 @@ void ColorButton_SetColor ( HWND hWnd, COLORREF color )
 	{
 		return;
 	}
-	SetWindowLong ( hWnd, GWL_USERDATA, color );
+	SetWindowLongPtr ( hWnd, GWLP_USERDATA, color );
 	InvalidateRect ( hWnd, NULL, FALSE );
 }
 
@@ -94,7 +94,7 @@ Retrieves the current color button color
 */
 COLORREF ColorButton_GetColor ( HWND hWnd )
 {
-	return (COLORREF) GetWindowLong ( hWnd, GWL_USERDATA );
+	return (COLORREF) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 }
 
 /*
@@ -198,7 +198,7 @@ void ColorButton_DrawItem ( HWND hWnd, LPDRAWITEMSTRUCT dis )
 	// Draw Color
 	if ((state & ODS_DISABLED) == 0)
 	{
-		HBRUSH color = CreateSolidBrush ( (COLORREF)GetWindowLong ( hWnd, GWL_USERDATA ) );
+		HBRUSH color = CreateSolidBrush ( (COLORREF)GetWindowLongPtr ( hWnd, GWLP_USERDATA ) );
 		FillRect ( hDC, &rDraw, color );
 		FrameRect ( hDC, &rDraw, (HBRUSH)::GetStockObject(BLACK_BRUSH));
 		DeleteObject( color );
diff --git a/neo/tools/common/MaskEdit.cpp b/neo/tools/common/MaskEdit.cpp
index c497035..aff8c57 100644
--- a/neo/tools/common/MaskEdit.cpp
+++ b/neo/tools/common/MaskEdit.cpp
@@ -45,7 +45,7 @@ Prevents the invalid characters from being entered
 */
 LRESULT CALLBACK MaskEdit_WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGEMaskEdit* edit = (rvGEMaskEdit*)GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGEMaskEdit* edit = (rvGEMaskEdit*)GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 	WNDPROC		  wndproc = edit->mProc;
 
 	switch ( msg )
@@ -60,7 +60,7 @@ LRESULT CALLBACK MaskEdit_WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM l
 
 		case WM_DESTROY:
 			delete edit;
-			SetWindowLong ( hWnd, GWL_WNDPROC, (LONG)wndproc );
+			SetWindowLongPtr ( hWnd, GWLP_WNDPROC, (LONG_PTR)wndproc );
 			break;
 	}
 
@@ -77,10 +77,10 @@ Attaches the mask edit control to a normal edit control
 void MaskEdit_Attach ( HWND hWnd, const char* invalid )
 {
 	rvGEMaskEdit* edit = new rvGEMaskEdit;
-	edit->mProc = (WNDPROC)GetWindowLong ( hWnd, GWL_WNDPROC );
+	edit->mProc = (WNDPROC)GetWindowLongPtr ( hWnd, GWLP_WNDPROC );
 	strcpy ( edit->mInvalid, invalid );
-	SetWindowLong ( hWnd, GWL_USERDATA, (LONG)edit );
-	SetWindowLong ( hWnd, GWL_WNDPROC, (LONG)MaskEdit_WndProc );
+	SetWindowLongPtr ( hWnd, GWLP_USERDATA, (LONG_PTR)edit );
+	SetWindowLongPtr ( hWnd, GWLP_WNDPROC, (LONG_PTR)MaskEdit_WndProc );
 }
 
 /*
diff --git a/neo/tools/common/OpenFileDialog.cpp b/neo/tools/common/OpenFileDialog.cpp
index 4927dc0..405d48f 100644
--- a/neo/tools/common/OpenFileDialog.cpp
+++ b/neo/tools/common/OpenFileDialog.cpp
@@ -306,7 +306,7 @@ void rvOpenFileDialog::HandleInitDialog ( void )
 	SendMessage( mWndLookin,CBEM_SETIMAGELIST,0,(LPARAM) mImageList );
 
 	// Back button is a bitmap button
-	SendMessage( GetDlgItem ( mWnd, IDC_TOOLS_BACK ), BM_SETIMAGE, IMAGE_BITMAP, (LONG) mBackBitmap );
+	SendMessage( GetDlgItem ( mWnd, IDC_TOOLS_BACK ), BM_SETIMAGE, IMAGE_BITMAP, (LONG_PTR) mBackBitmap );
 
 	// Allow custom titles
 	SetWindowText ( mWnd, mTitle );
@@ -392,7 +392,8 @@ void rvOpenFileDialog::SetFilter ( const char* s )
 		if ( semi != -1 )
 		{
 			filter  = filters.Left ( semi );
-			filters = filters.Right ( filters.Length ( ) - semi );
+			filters = filters.Right ( filters.Length ( ) - (semi + 1));
+			filters.Strip(' ');
 		}
 		else
 		{
@@ -413,13 +414,13 @@ Dialog Procedure for the open file dialog
 */
 INT_PTR rvOpenFileDialog::DlgProc ( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam )
 {
-	rvOpenFileDialog* dlg = (rvOpenFileDialog*) GetWindowLong ( wnd, GWL_USERDATA );
+	rvOpenFileDialog* dlg = (rvOpenFileDialog*) GetWindowLongPtr ( wnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
 		case WM_INITDIALOG:
 			dlg = (rvOpenFileDialog*) lparam;
-			SetWindowLong ( wnd, GWL_USERDATA, lparam );
+			SetWindowLongPtr ( wnd, GWLP_USERDATA, lparam );
 			dlg->mWnd = wnd;
 			dlg->HandleInitDialog ( );
 			return TRUE;
diff --git a/neo/tools/common/PropertyGrid.cpp b/neo/tools/common/PropertyGrid.cpp
index efe8707..23f70f7 100644
--- a/neo/tools/common/PropertyGrid.cpp
+++ b/neo/tools/common/PropertyGrid.cpp
@@ -75,10 +75,11 @@ bool rvPropertyGrid::Create ( HWND parent, int id, int style )
 	mStyle = style;
 
 	// Create the List view
-	mWindow = CreateWindowEx ( 0, "LISTBOX", "", WS_VSCROLL|WS_CHILD|WS_VISIBLE|LBS_OWNERDRAWFIXED|LBS_NOINTEGRALHEIGHT|LBS_NOTIFY, 0, 0, 0, 0, parent, (HMENU)id, win32.hInstance, 0 );
-	mListWndProc = (WNDPROC)GetWindowLong ( mWindow, GWL_WNDPROC );
-	SetWindowLong ( mWindow, GWL_USERDATA, (LONG)this );
-	SetWindowLong ( mWindow, GWL_WNDPROC, (LONG)WndProc );
+	const HMENU hmenuID = (HMENU)(intptr_t)id; // DG: apparently an int ID (instead of a handle/pointer) can be valid here, depending on window style
+	mWindow = CreateWindowEx ( 0, "LISTBOX", "", WS_VSCROLL|WS_CHILD|WS_VISIBLE|LBS_OWNERDRAWFIXED|LBS_NOINTEGRALHEIGHT|LBS_NOTIFY, 0, 0, 0, 0, parent, hmenuID, win32.hInstance, 0 );
+	mListWndProc = (WNDPROC)GetWindowLongPtr ( mWindow, GWLP_WNDPROC );
+	SetWindowLongPtr ( mWindow, GWLP_USERDATA, (LONG_PTR)this );
+	SetWindowLongPtr ( mWindow, GWLP_WNDPROC, (LONG_PTR)WndProc );
 
 	LoadLibrary ( "Riched20.dll" );
 	mEdit = CreateWindowEx ( 0, "RichEdit20A", "", WS_CHILD, 0, 0, 0, 0, mWindow, (HMENU) 999, win32.hInstance, NULL );
@@ -198,7 +199,7 @@ void rvPropertyGrid::FinishEdit ( void )
 		nmpg.mName  = item->mName;
 		nmpg.mValue = value;
 
-		if ( !SendMessage ( GetParent ( mWindow ), WM_NOTIFY, 0, (LONG)&nmpg ) )
+		if ( !SendMessage ( GetParent ( mWindow ), WM_NOTIFY, 0, (LPARAM)&nmpg ) )
 		{
 			mState = STATE_EDIT;
 			SetFocus ( mEdit );
@@ -281,7 +282,7 @@ int rvPropertyGrid::AddItem ( const char* name, const char* value, EItemType typ
 
 	insert = SendMessage(mWindow,LB_GETCOUNT,0,0) - ((mStyle&PGS_ALLOWINSERT)?1:0);
 
-	return SendMessage ( mWindow, LB_INSERTSTRING, insert, (LONG)item );
+	return SendMessage ( mWindow, LB_INSERTSTRING, insert, (LPARAM)item );
 }
 
 /*
@@ -330,7 +331,7 @@ void rvPropertyGrid::RemoveAllItems ( void )
 		item = new rvPropertyGridItem;
 		item->mName = "";
 		item->mValue = "";
-		SendMessage ( mWindow, LB_ADDSTRING, 0, (LONG)item );
+		SendMessage ( mWindow, LB_ADDSTRING, 0, (LPARAM)item );
 	}
 }
 
@@ -383,7 +384,7 @@ Window procedure for property grid
 */
 LRESULT CALLBACK rvPropertyGrid::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvPropertyGrid* grid = (rvPropertyGrid*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvPropertyGrid* grid = (rvPropertyGrid*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -398,7 +399,7 @@ LRESULT CALLBACK rvPropertyGrid::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, L
 			nmkey.hdr.hwndFrom = grid->mWindow;
 			nmkey.nVKey = wParam;
 			nmkey.uFlags = HIWORD(lParam);
-			nmkey.hdr.idFrom = GetWindowLong ( hWnd, GWL_ID );
+			nmkey.hdr.idFrom = GetWindowLongPtr ( hWnd, GWL_ID );
 			SendMessage ( GetParent ( hWnd ), WM_NOTIFY, nmkey.hdr.idFrom, (LPARAM)&nmkey );
 			break;
 		}
@@ -462,7 +463,7 @@ LRESULT CALLBACK rvPropertyGrid::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, L
 		}
 
 		case WM_COMMAND:
-			if ( lParam == (long)grid->mEdit )
+			if ( lParam == (LPARAM)grid->mEdit )
 			{
 				if ( HIWORD(wParam) == EN_KILLFOCUS )
 				{
@@ -545,7 +546,7 @@ LRESULT CALLBACK rvPropertyGrid::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, L
 			ScreenToClient ( hWnd, &point );
 			if ( point.x >= grid->mSplitter - 2 && point.x <= grid->mSplitter + 2 )
 			{
-				SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_SIZEWE)));
+				SetCursor ( LoadCursor ( NULL, IDC_SIZEWE));
 				return TRUE;
 			}
 			break;
@@ -586,8 +587,10 @@ bool rvPropertyGrid::ReflectMessage ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM
 
 		case WM_MEASUREITEM:
 		{
+			float scaling_factor = Win_GetWindowScalingFactor(hWnd);
+
 			MEASUREITEMSTRUCT* mis = (MEASUREITEMSTRUCT*) lParam;
-			mis->itemHeight = 18;
+			mis->itemHeight = 18 * scaling_factor;
 			return true;
 		}
 	}
diff --git a/neo/tools/common/RollupPanel.cpp b/neo/tools/common/RollupPanel.cpp
index 09407fd..282772c 100644
--- a/neo/tools/common/RollupPanel.cpp
+++ b/neo/tools/common/RollupPanel.cpp
@@ -161,8 +161,8 @@ int rvRollupPanel::InsertItem ( const char* caption, HWND dialog, bool autoDestr
 	item->mDialog		 = dialog;
 	item->mButton		 = button;
 	item->mGroupBox		 = groupbox;
-	item->mOldDlgProc	 = (WNDPROC) GetWindowLong ( dialog, DWL_DLGPROC );
-	item->mOldButtonProc = (WNDPROC) GetWindowLong ( button, GWL_WNDPROC );
+	item->mOldDlgProc	 = (WNDPROC) GetWindowLongPtr ( dialog, DWLP_DLGPROC );
+	item->mOldButtonProc = (WNDPROC) GetWindowLongPtr ( button, GWLP_WNDPROC );
 	item->mAutoDestroy	 = autoDestroy;
 	strcpy ( item->mCaption, caption );
 
@@ -176,17 +176,17 @@ int rvRollupPanel::InsertItem ( const char* caption, HWND dialog, bool autoDestr
 	}
 
 	// Store data with the dialog window in its user data
-	SetWindowLong ( dialog, GWL_USERDATA,	(LONG)item );
+	SetWindowLongPtr ( dialog, GWLP_USERDATA,	(LONG_PTR)item );
 
 	// Attach item to button through user data
-	SetWindowLong ( button, GWL_USERDATA,	(LONG)item );
-	SetWindowLong ( button, GWL_ID,			index );
+	SetWindowLongPtr( button, GWLP_USERDATA,	(LONG_PTR)item );
+	SetWindowLongPtr( button, GWL_ID,			index );
 
 	// Subclass dialog
-	SetWindowLong ( dialog, DWL_DLGPROC, (LONG)DialogProc );
+	SetWindowLongPtr( dialog, DWLP_DLGPROC, (LONG_PTR)DialogProc );
 
 	// SubClass button
-	SetWindowLong ( button, GWL_WNDPROC, (LONG)ButtonProc );
+	SetWindowLongPtr( button, GWLP_WNDPROC, (LONG_PTR)ButtonProc );
 
 	// Update
 	mItemHeight += RP_PGBUTTONHEIGHT+(RP_GRPBOXINDENT/2);
@@ -699,8 +699,8 @@ Dialog procedure for items
 */
 LRESULT CALLBACK rvRollupPanel::DialogProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
 {
-	RPITEM*			item  = (RPITEM*)GetWindowLong ( hWnd, GWL_USERDATA );
-	rvRollupPanel*	_this = (rvRollupPanel*)GetWindowLong ( GetParent ( hWnd ), GWL_USERDATA );
+	RPITEM*			item  = (RPITEM*)GetWindowLongPtr ( hWnd, GWLP_USERDATA );
+	rvRollupPanel*	_this = (rvRollupPanel*)GetWindowLongPtr( GetParent ( hWnd ), GWLP_USERDATA );
 
 	RECT r;
 	GetClientRect ( _this->mWindow, &r );
@@ -771,7 +771,7 @@ LRESULT CALLBACK rvRollupPanel::ButtonProc (HWND hWnd, UINT uMsg, WPARAM wParam,
 		return FALSE;
 	}
 
-	RPITEM* item = (RPITEM*)GetWindowLong(hWnd, GWL_USERDATA);
+	RPITEM* item = (RPITEM*)GetWindowLongPtr(hWnd, GWLP_USERDATA);
 	return ::CallWindowProc( item->mOldButtonProc, hWnd, uMsg, wParam, lParam );
 }
 
@@ -785,7 +785,7 @@ Window procedure for rollup panel
 LRESULT CALLBACK rvRollupPanel::WindowProc (HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam)
 {
 	rvRollupPanel* panel;
-	panel = (rvRollupPanel*)GetWindowLong (hWnd, GWL_USERDATA);
+	panel = (rvRollupPanel*)GetWindowLongPtr (hWnd, GWLP_USERDATA);
 
 	switch ( uMsg )
 	{
@@ -796,7 +796,7 @@ LRESULT CALLBACK rvRollupPanel::WindowProc (HWND hWnd, UINT uMsg, WPARAM wParam,
 			// Attach the class to the window first
 			cs = (LPCREATESTRUCT) lParam;
 			panel = (rvRollupPanel*) cs->lpCreateParams;
-			SetWindowLong ( hWnd, GWL_USERDATA, (LONG)panel );
+			SetWindowLongPtr ( hWnd, GWLP_USERDATA, (LONG_PTR)panel );
 			break;
 		}
 
diff --git a/neo/tools/common/SpinButton.cpp b/neo/tools/common/SpinButton.cpp
index 8520bf4..0b5d390 100644
--- a/neo/tools/common/SpinButton.cpp
+++ b/neo/tools/common/SpinButton.cpp
@@ -33,7 +33,7 @@ If you have questions concerning this license or the applicable additional terms
 
 void SpinButton_SetIncrement ( HWND hWnd, float inc )
 {
-	SetWindowLong ( hWnd, GWL_USERDATA, (long)(inc * 100.0f) );
+	SetWindowLongPtr ( hWnd, GWLP_USERDATA, (long)(inc * 100.0f) );
 }
 
 void SpinButton_SetRange ( HWND hWnd, float minRange, float maxRange )
@@ -51,11 +51,11 @@ void SpinButton_HandleNotify ( NMHDR* hdr )
 	float value;
 	GetWindowText ( (HWND)SendMessage ( hdr->hwndFrom, UDM_GETBUDDY, 0, 0 ), strValue, 63 );
 
-	float inc = (float)GetWindowLong ( hdr->hwndFrom, GWL_USERDATA );
+	float inc = (float)GetWindowLongPtr ( hdr->hwndFrom, GWLP_USERDATA );
 	if ( inc == 0 )
 	{
 		inc = 100.0f;
-		SetWindowLong ( hdr->hwndFrom, GWL_USERDATA, 100 );
+		SetWindowLongPtr ( hdr->hwndFrom, GWLP_USERDATA, 100 );
 	}
 	inc /= 100.0f;
 
@@ -72,7 +72,7 @@ void SpinButton_HandleNotify ( NMHDR* hdr )
 
 	LONG minRange;
 	LONG maxRange;
-	SendMessage ( hdr->hwndFrom, UDM_GETRANGE32, (LONG)&minRange, (LONG)&maxRange );
+	SendMessage ( hdr->hwndFrom, UDM_GETRANGE32, (LONG_PTR)&minRange, (LONG_PTR)&maxRange );
 	if ( minRange !=  0 || maxRange != 0 )
 	{
 		float minRangef = (float)(long)minRange / 100.0f;
diff --git a/neo/tools/compilers/dmap/tritjunction.cpp b/neo/tools/compilers/dmap/tritjunction.cpp
index 1974aea..0c33d5a 100644
--- a/neo/tools/compilers/dmap/tritjunction.cpp
+++ b/neo/tools/compilers/dmap/tritjunction.cpp
@@ -598,7 +598,13 @@ void	FixGlobalTjunctions( uEntity_t *e ) {
 			if ( !modelName ) {
 				continue;
 			}
-			if ( !strstr( modelName, ".lwo" ) && !strstr( modelName, ".ase" ) && !strstr( modelName, ".ma" ) ) {
+			if ( !strstr( modelName, ".lwo" ) 
+				&& !strstr( modelName, ".ase" ) 
+				&& !strstr( modelName, ".ma" ) 
+#if USE_COLLADA
+				&& !strstr(modelName, ".dea")
+#endif
+				) {
 				continue;
 			}
 
diff --git a/neo/tools/compilers/roqvq/codec.cpp b/neo/tools/compilers/roqvq/codec.cpp
index 612d088..a195ecc 100644
--- a/neo/tools/compilers/roqvq/codec.cpp
+++ b/neo/tools/compilers/roqvq/codec.cpp
@@ -225,7 +225,8 @@ void codec::Segment( int *alist, float *flist, int numElements, float rmse)
 				fy = RMULT*(float)(codebook2[onf][numc+0]) +
 						GMULT*(float)(codebook2[onf][numc+1]) +
 							BMULT*(float)(codebook2[onf][numc+2]) + 0.5f;
-				 if (fy<0) fy = 0; if (fy>255) fy = 255;
+				if (fy<0) fy = 0;
+				if (fy>255) fy = 255;
 
 				fcr += RIEMULT*(float)(codebook2[onf][numc+0]);
 				fcr += GIEMULT*(float)(codebook2[onf][numc+1]);
@@ -1319,7 +1320,9 @@ byte *idataA, *idataB;
 		j = 0;
 		for( i=0; i<numQuadCels; i++ ) {
 			if (qStatus[i].size == 8 && qStatus[i].status) {
-				if (qStatus[i].status < DEAD) num[qStatus[i].status]++; j++;
+				if (qStatus[i].status < DEAD)
+					num[qStatus[i].status]++;
+				j++;
 			}
 		}
 		common->Printf("sparseEncode: for 08x08 CCC = %d, FCC = %d, MOT = %d, SLD = %d, PAT = %d\n", num[CCC], num[FCC], num[MOT], num[SLD], num[PAT]);
@@ -1327,7 +1330,9 @@ byte *idataA, *idataB;
 		for(i=0;i<DEAD;i++) num[i] = 0;
 		for( i=0; i<numQuadCels; i++ ) {
 			if (qStatus[i].size == 4 && qStatus[i].status) {
-				if (qStatus[i].status < DEAD) num[qStatus[i].status]++; j++;
+				if (qStatus[i].status < DEAD)
+					num[qStatus[i].status]++;
+				j++;
 			}
 		}
 		common->Printf("sparseEncode: for 04x04 CCC = %d, FCC = %d, MOT = %d, SLD = %d, PAT = %d\n", num[CCC], num[FCC], num[MOT], num[SLD], num[PAT]);
@@ -1420,7 +1425,9 @@ byte *idataA, *idataB;
 		j = 0;
 		for( i=0; i<numQuadCels; i++ ) {
 			if (qStatus[i].size == 8 && qStatus[i].status) {
-				if (qStatus[i].status < DEAD) num[qStatus[i].status]++; j++;
+				if (qStatus[i].status < DEAD)
+					num[qStatus[i].status]++;
+				j++;
 			}
 		}
 		common->Printf("sparseEncode: for 08x08 CCC = %d, FCC = %d, MOT = %d, SLD = %d, PAT = %d\n", num[CCC], num[FCC], num[MOT], num[SLD], num[PAT]);
@@ -1428,7 +1435,9 @@ byte *idataA, *idataB;
 		for(i=0;i<DEAD;i++) num[i] = 0;
 		for( i=0; i<numQuadCels; i++ ) {
 			if (qStatus[i].size == 4 && qStatus[i].status) {
-				if (qStatus[i].status < DEAD) num[qStatus[i].status]++; j++;
+				if (qStatus[i].status < DEAD)
+					num[qStatus[i].status]++;
+				j++;
 			}
 		}
 		common->Printf("sparseEncode: for 04x04 CCC = %d, FCC = %d, MOT = %d, SLD = %d, PAT = %d\n", num[CCC], num[FCC], num[MOT], num[SLD], num[PAT]);
diff --git a/neo/tools/compilers/roqvq/roq.h b/neo/tools/compilers/roqvq/roq.h
index f03a4ee..2d6c8bd 100644
--- a/neo/tools/compilers/roqvq/roq.h
+++ b/neo/tools/compilers/roqvq/roq.h
@@ -29,7 +29,7 @@ If you have questions concerning this license or the applicable additional terms
 #define __roq_h__
 
 //#define JPEG_INTERNALS
-#include <jpeglib.h>
+//#include <jpeglib.h> // DG: unused
 
 #include "tools/compilers/roqvq/gdefs.h"
 #include "tools/compilers/roqvq/roqParam.h"
diff --git a/neo/tools/debugger/DebuggerApp.cpp b/neo/tools/debugger/DebuggerApp.cpp
index 82aacaa..1176d84 100644
--- a/neo/tools/debugger/DebuggerApp.cpp
+++ b/neo/tools/debugger/DebuggerApp.cpp
@@ -37,8 +37,8 @@ If you have questions concerning this license or the applicable additional terms
 rvDebuggerApp::rvDebuggerApp
 ================
 */
-rvDebuggerApp::rvDebuggerApp ( ) :
-	mOptions ( "Software\\id Software\\DOOM3\\Tools\\Debugger" )
+rvDebuggerApp::rvDebuggerApp ( ) //:
+	//mOptions ( "Software\\id Software\\DOOM3\\Tools\\Debugger" )
 {
 	mInstance		= NULL;
 	mDebuggerWindow = NULL;
diff --git a/neo/tools/debugger/DebuggerApp.h b/neo/tools/debugger/DebuggerApp.h
index d659b1d..4438e97 100644
--- a/neo/tools/debugger/DebuggerApp.h
+++ b/neo/tools/debugger/DebuggerApp.h
@@ -29,7 +29,7 @@ If you have questions concerning this license or the applicable additional terms
 #define DEBUGGERAPP_H_
 
 #include "../../sys/win32/win_local.h"
-#include "../../framework/sync/Msg.h"
+//#include "../../framework/sync/Msg.h"
 
 #ifndef REGISTRYOPTIONS_H_
 #include "../common/RegistryOptions.h"
@@ -49,13 +49,15 @@ If you have questions concerning this license or the applicable additional terms
 
 // These were changed to static by ID so to make it easy we just throw them
 // in this header
-const int MAX_MSGLEN = 1400;
+// we need a lot to be able to list all threads in mars_city1
+const int MAX_MSGLEN = 8600;
 
 class rvDebuggerApp
 {
 public:
 
 	rvDebuggerApp ( );
+	~rvDebuggerApp();
 
 	bool				Initialize				( HINSTANCE hInstance );
 	int					Run						( void );
diff --git a/neo/tools/debugger/DebuggerBreakpoint.cpp b/neo/tools/debugger/DebuggerBreakpoint.cpp
index 975f2eb..89df7f5 100644
--- a/neo/tools/debugger/DebuggerBreakpoint.cpp
+++ b/neo/tools/debugger/DebuggerBreakpoint.cpp
@@ -25,20 +25,23 @@ If you have questions concerning this license or the applicable additional terms
 
 ===========================================================================
 */
-
+#if defined( ID_ALLOW_TOOLS )
 #include "tools/edit_gui_common.h"
-
-
 #include "DebuggerApp.h"
+#else
+#include "debugger_common.h"
+#endif
+
 #include "DebuggerBreakpoint.h"
 
 int rvDebuggerBreakpoint::mNextID = 1;
 
-rvDebuggerBreakpoint::rvDebuggerBreakpoint ( const char* filename, int linenumber, int id )
+rvDebuggerBreakpoint::rvDebuggerBreakpoint ( const char* filename, int linenumber, int id, bool onceOnly )
 {
 	mFilename = filename;
 	mLineNumber = linenumber;
 	mEnabled = true;
+	mOnceOnly = onceOnly;
 
 	if ( id == -1 )
 	{
diff --git a/neo/tools/debugger/DebuggerBreakpoint.h b/neo/tools/debugger/DebuggerBreakpoint.h
index a1f07a6..84e85de 100644
--- a/neo/tools/debugger/DebuggerBreakpoint.h
+++ b/neo/tools/debugger/DebuggerBreakpoint.h
@@ -28,25 +28,28 @@ If you have questions concerning this license or the applicable additional terms
 #ifndef DEBUGGERBREAKPOINT_H_
 #define DEBUGGERBREAKPOINT_H_
 
+class idProgram;
+
 class rvDebuggerBreakpoint
 {
 public:
 
-	rvDebuggerBreakpoint ( const char* filename, int linenumber, int id = -1 );
+	rvDebuggerBreakpoint ( const char* filename, int linenumber, int id = -1, bool onceOnly = false );
 	rvDebuggerBreakpoint ( rvDebuggerBreakpoint& bp );
 	~rvDebuggerBreakpoint ( void );
 
 	const char*		GetFilename		( void );
 	int				GetLineNumber	( void );
 	int				GetID			( void );
+	bool			GetOnceOnly     ( void );
 
 protected:
 
 	bool	mEnabled;
+	bool	mOnceOnly;
 	int		mID;
 	int		mLineNumber;
 	idStr	mFilename;
-
 private:
 
 	static int	mNextID;
@@ -67,4 +70,9 @@ ID_INLINE int rvDebuggerBreakpoint::GetID ( void )
 	return mID;
 }
 
+ID_INLINE bool rvDebuggerBreakpoint::GetOnceOnly( void )
+{
+	return mOnceOnly;
+}
+
 #endif // DEBUGGERBREAKPOINT_H_
diff --git a/neo/tools/debugger/DebuggerClient.cpp b/neo/tools/debugger/DebuggerClient.cpp
index 6ce6725..3ffb386 100644
--- a/neo/tools/debugger/DebuggerClient.cpp
+++ b/neo/tools/debugger/DebuggerClient.cpp
@@ -73,7 +73,7 @@ bool rvDebuggerClient::Initialize ( void )
 	}
 
 	// Server must be running on the local host on port 28980
-	Sys_StringToNetAdr ( "localhost", &mServerAdrt, true );
+	Sys_StringToNetAdr ( com_dbgServerAdr.GetString( ), &mServerAdr, true );
 	mServerAdr.port = 27980;
 
 	// Attempt to let the server know we are here.  The server may not be running so this
@@ -110,25 +110,29 @@ Process all incomding messages from the debugger server
 bool rvDebuggerClient::ProcessMessages ( void )
 {
 	netadr_t adrFrom;
-	msg_t	 msg;
+	idBitMsg	 msg;
 	byte	 buffer[MAX_MSGLEN];
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
+	msg.SetSize(MAX_MSGLEN);
+	msg.BeginReading();
 
+	int msgSize;
 	// Check for pending udp packets on the debugger port
-	while ( mPort.GetPacket ( adrFrom, msg.data, msg.cursize, msg.maxsize ) )
+	while ( mPort.GetPacket ( adrFrom, buffer,msgSize, MAX_MSGLEN) )
 	{
-		unsigned short command;
+		short command;
+		msg.Init(buffer, sizeof(buffer));
+		msg.SetSize(msgSize);
+		msg.BeginReading();
 
 		// Only accept packets from the debugger server for security reasons
 		if ( !Sys_CompareNetAdrBase ( adrFrom, mServerAdr ) )
 		{
 			continue;
 		}
+		command = msg.ReadShort ( );
 
-		command = (unsigned short) MSG_ReadShort ( &msg );
-
-		// Is this what we are waiting for?
+		// Is this what we are waiting for? 
 		if ( command == mWaitFor )
 		{
 			mWaitFor = DBMSG_UNKNOWN;
@@ -168,17 +172,39 @@ bool rvDebuggerClient::ProcessMessages ( void )
 			case DBMSG_INSPECTVARIABLE:
 				HandleInspectVariable ( &msg );
 				break;
+
+			case DBMSG_REMOVEBREAKPOINT:
+				HandleRemoveBreakpoint( &msg );
+				break;
+			case DBMSG_INSPECTSCRIPTS:
+				HandleInspectScripts( &msg );
+				break;
 		}
 
 		// Give the window a chance to process the message
-		msg.readcount = 0;
-		msg.bit = 0;
+		msg.SetReadCount(0);
+		msg.SetReadBit(0);
 		gDebuggerApp.GetWindow().ProcessNetMessage ( &msg );
 	}
 
 	return true;
 }
 
+void rvDebuggerClient::HandleRemoveBreakpoint(idBitMsg* msg)
+{
+	long lineNumber;
+	char filename[MAX_PATH];
+
+	// Read the breakpoint info
+
+	lineNumber = msg->ReadInt();
+	msg->ReadString(filename, MAX_PATH);
+
+	rvDebuggerBreakpoint* bp = FindBreakpoint(filename, lineNumber);
+	if(bp)
+		RemoveBreakpoint(bp->GetID());
+}
+
 /*
 ================
 rvDebuggerClient::HandleBreak
@@ -187,19 +213,22 @@ Handle the DBMSG_BREAK message send from the server.  This message is handled
 by caching the file and linenumber where the break occured.
 ================
 */
-void rvDebuggerClient::HandleBreak ( msg_t* msg )
+void rvDebuggerClient::HandleBreak ( idBitMsg* msg )
 {
 	char filename[MAX_PATH];
 
 	mBreak = true;
 
 	// Line number
-	mBreakLineNumber = MSG_ReadInt ( msg );
+	mBreakLineNumber = msg->ReadInt ( );
 
 	// Filename
-	MSG_ReadString ( msg, filename, MAX_PATH );
+	msg->ReadString ( filename, MAX_PATH );
 	mBreakFilename   = filename;
 
+	//int64_t ptr64b = msg->ReadInt64();
+	//mBreakProgram = (idProgram*)ptr64b;
+
 	// Clear the variables
 	mVariables.Clear ( );
 
@@ -211,6 +240,26 @@ void rvDebuggerClient::HandleBreak ( msg_t* msg )
 	WaitFor ( DBMSG_INSPECTTHREADS, 2000 );
 }
 
+
+/*
+================
+rvDebuggerClient::InspectScripts
+
+Instructs the client to inspect the loaded scripts
+================
+*/
+void rvDebuggerClient::InspectScripts ( void )
+{
+	idBitMsg	msg;
+	byte		buffer[MAX_MSGLEN];
+
+	msg.Init(buffer, sizeof(buffer));
+	msg.BeginWriting();
+	msg.WriteShort((short)DBMSG_INSPECTSCRIPTS);
+	SendPacket(msg.GetData(), msg.GetSize());
+}
+
+
 /*
 ================
 rvDebuggerClient::InspectVariable
@@ -222,15 +271,41 @@ will in turn respond back to the client with the variable value
 */
 void rvDebuggerClient::InspectVariable ( const char* name, int callstackDepth )
 {
-	msg_t	 msg;
-	byte	 buffer[MAX_MSGLEN];
+	idBitMsg	msg;
+	byte		buffer[MAX_MSGLEN];
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_INSPECTVARIABLE );
-	MSG_WriteShort ( &msg, (short)(mCallstack.Num()-callstackDepth) );
-	MSG_WriteString ( &msg, name );
+	msg.Init( buffer, sizeof( buffer ) );
+	msg.BeginWriting();
+	msg.WriteShort ( (short)DBMSG_INSPECTVARIABLE );
+	msg.WriteShort ( (short)(mCallstack.Num()-callstackDepth) );
+	msg.WriteString ( name );
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket ( msg.GetData(), msg.GetSize());
+}
+
+/*
+================
+rvDebuggerClient::HandleInspectScripts
+
+Handle the message DBMSG_INSPECTSCRIPTS being sent from the server.  This message
+is handled by adding the script entries to a list for later lookup.
+================
+*/
+void rvDebuggerClient::HandleInspectScripts( idBitMsg* msg )
+{	
+	int totalScripts;
+
+	mServerScripts.Clear();
+
+	// Read all of the callstack entries specfied in the message
+	for (totalScripts = msg->ReadInt(); totalScripts > 0; totalScripts--)
+	{
+		char temp[1024];
+
+		// Script Name
+		msg->ReadString(temp, 1024);
+		mServerScripts.Append(temp);
+	}
 }
 
 /*
@@ -241,29 +316,29 @@ Handle the message DBMSG_INSPECTCALLSTACK being sent from the server.  This mess
 is handled by adding the callstack entries to a list for later lookup.
 ================
 */
-void rvDebuggerClient::HandleInspectCallstack ( msg_t* msg )
+void rvDebuggerClient::HandleInspectCallstack ( idBitMsg* msg )
 {
 	int depth;
 
 	ClearCallstack ( );
 
 	// Read all of the callstack entries specfied in the message
-	for ( depth = (short)MSG_ReadShort ( msg ) ; depth > 0; depth -- )
+	for ( depth = (short)msg->ReadShort ( ) ; depth > 0; depth -- )
 	{
 		rvDebuggerCallstack* entry = new rvDebuggerCallstack;
 
 		char temp[1024];
 
 		// Function name
-		MSG_ReadString ( msg, temp, 1024 );
-		entry->mFunction = temp;
+		msg->ReadString ( temp, 1024 );
+		entry->mFunction = idStr(temp);
 
 		// Filename
-		MSG_ReadString ( msg, temp, 1024 );
-		entry->mFilename = temp;
+		msg->ReadString ( temp, 1024 );
+		entry->mFilename = idStr(temp);
 
 		// Line Number
-		entry->mLineNumber = MSG_ReadInt ( msg );
+		entry->mLineNumber = msg->ReadInt ( );
 
 		// Add to list
 		mCallstack.Append ( entry );
@@ -278,31 +353,31 @@ Handle the message DBMSG_INSPECTTHREADS being sent from the server.  This messag
 is handled by adding the list of threads to a list for later lookup.
 ================
 */
-void rvDebuggerClient::HandleInspectThreads ( msg_t* msg )
+void rvDebuggerClient::HandleInspectThreads ( idBitMsg* msg )
 {
 	int	count;
 
 	ClearThreads ( );
 
 	// Loop over the number of threads in the message
-	for ( count = (short)MSG_ReadShort ( msg ) ; count > 0; count -- )
+	for ( count = (short)msg->ReadShort ( ) ; count > 0; count -- )
 	{
 		rvDebuggerThread* entry = new rvDebuggerThread;
 
 		char temp[1024];
 
 		// Thread name
-		MSG_ReadString ( msg, temp, 1024 );
+		msg->ReadString ( temp, 1024 );
 		entry->mName = temp;
 
 		// Thread ID
-		entry->mID = MSG_ReadInt ( msg );
+		entry->mID = msg->ReadInt ( );
 
 		// Thread state
-		entry->mCurrent = MSG_ReadBits ( msg, 1 ) ? true : false;
-		entry->mDoneProcessing = MSG_ReadBits ( msg, 1 ) ? true : false;
-		entry->mWaiting = MSG_ReadBits ( msg, 1 ) ? true : false;
-		entry->mDying = MSG_ReadBits ( msg, 1 ) ? true : false;
+		entry->mCurrent = msg->ReadBits ( 1 ) ? true : false;
+		entry->mDoneProcessing = msg->ReadBits ( 1 ) ? true : false;
+		entry->mWaiting = msg->ReadBits ( 1 ) ? true : false;
+		entry->mDying = msg->ReadBits ( 1 ) ? true : false;
 
 		// Add thread to list
 		mThreads.Append ( entry );
@@ -317,15 +392,15 @@ Handle the message DBMSG_INSPECTVARIABLE being sent from the server.  This messa
 is handled by adding the inspected variable to a dictionary for later lookup
 ================
 */
-void rvDebuggerClient::HandleInspectVariable ( msg_t* msg )
+void rvDebuggerClient::HandleInspectVariable ( idBitMsg* msg )
 {
 	char	var[1024];
 	char	value[1024];
 	int		callDepth;
 
-	callDepth = (short)MSG_ReadShort ( msg );
-	MSG_ReadString ( msg, var, 1024 );
-	MSG_ReadString ( msg, value, 1024 );
+	callDepth = (short)msg->ReadShort ( );
+	msg->ReadString ( var, 1024 );
+	msg->ReadString ( value, 1024 );
 
 	mVariables.Set ( va("%d:%s", mCallstack.Num()-callDepth, var), value );
 }
@@ -422,7 +497,7 @@ Adds a breakpoint to the client and server with the give nfilename and linenumbe
 */
 int rvDebuggerClient::AddBreakpoint ( const char* filename, int lineNumber, bool onceOnly )
 {
-	int index = mBreakpoints.Append ( new rvDebuggerBreakpoint ( filename, lineNumber ) );
+	int index = mBreakpoints.Append ( new rvDebuggerBreakpoint ( filename, lineNumber, -1, onceOnly ) );
 
 	SendAddBreakpoint ( *mBreakpoints[index] );
 
@@ -463,13 +538,14 @@ Send a message with no data to the debugger server
 */
 void rvDebuggerClient::SendMessage ( EDebuggerMessage dbmsg )
 {
-	msg_t	 msg;
+	idBitMsg	 msg;
 	byte	 buffer[MAX_MSGLEN];
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)dbmsg );
+	msg.Init ( buffer, sizeof( buffer ) );
+	msg.BeginWriting ( );
+	msg.WriteShort ( (short)dbmsg );
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket ( msg.GetData(), msg.GetSize() );
 }
 
 /*
@@ -502,9 +578,9 @@ rvDebuggerClient::SendAddBreakpoint
 Send an individual breakpoint over to the debugger server
 ================
 */
-void rvDebuggerClient::SendAddBreakpoint ( rvDebuggerBreakpoint& bp, bool onceOnly )
+void rvDebuggerClient::SendAddBreakpoint ( rvDebuggerBreakpoint& bp )
 {
-	msg_t	 msg;
+	idBitMsg msg;
 	byte	 buffer[MAX_MSGLEN];
 
 	if ( !mConnected )
@@ -512,14 +588,15 @@ void rvDebuggerClient::SendAddBreakpoint ( rvDebuggerBreakpoint& bp, bool onceOn
 		return;
 	}
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_ADDBREAKPOINT );
-	MSG_WriteBits ( &msg, onceOnly?1:0, 1 );
-	MSG_WriteInt ( &msg, (unsigned long) bp.GetLineNumber ( ) );
-	MSG_WriteInt ( &msg, bp.GetID ( ) );
-	MSG_WriteString ( &msg, bp.GetFilename() );
+	msg.Init( buffer, sizeof( buffer ) );
+	msg.BeginWriting();
+	msg.WriteShort	( (short)DBMSG_ADDBREAKPOINT );
+	msg.WriteBits	( bp.GetOnceOnly() ? 1 : 0, 1 );
+	msg.WriteInt	( (unsigned long) bp.GetLineNumber ( ) );
+	msg.WriteInt	( bp.GetID ( ) );
+	msg.WriteString ( bp.GetFilename() ); // FIXME: this implies make7bit ?!
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket ( msg.GetData(), msg.GetSize() );
 }
 
 /*
@@ -531,7 +608,7 @@ Sends a remove breakpoint message to the debugger server
 */
 void rvDebuggerClient::SendRemoveBreakpoint ( rvDebuggerBreakpoint& bp )
 {
-	msg_t	 msg;
+	idBitMsg	 msg;
 	byte	 buffer[MAX_MSGLEN];
 
 	if ( !mConnected )
@@ -539,11 +616,12 @@ void rvDebuggerClient::SendRemoveBreakpoint ( rvDebuggerBreakpoint& bp )
 		return;
 	}
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_REMOVEBREAKPOINT );
-	MSG_WriteInt ( &msg, bp.GetID() );
+	msg.Init		( buffer, sizeof( buffer ) );
+	msg.BeginWriting( );
+	msg.WriteShort	( (short)DBMSG_REMOVEBREAKPOINT );
+	msg.WriteInt	( bp.GetID() );
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket ( msg.GetData(), msg.GetSize() );
 }
 
 /*
@@ -583,3 +661,25 @@ void rvDebuggerClient::ClearThreads ( void )
 
 	mThreads.Clear ( );
 }
+/*
+================
+rvDebuggerClient::SendCommand
+================
+*/
+void rvDebuggerClient::SendCommand( const char *cmdStr )
+{
+	idBitMsg msg;
+	byte	 buffer[MAX_MSGLEN];
+
+	if ( !mConnected ) 	{
+		return;
+	}
+
+	msg.Init( buffer, sizeof( buffer ) );
+	msg.BeginWriting( );
+	msg.WriteShort( ( short ) DBMSG_EXECCOMMAND );
+	msg.WriteString( cmdStr ); // FIXME: this implies make7bit ?!
+
+	SendPacket( msg.GetData( ), msg.GetSize( ) );
+}
+
diff --git a/neo/tools/debugger/DebuggerClient.h b/neo/tools/debugger/DebuggerClient.h
index dfc6771..0116bf9 100644
--- a/neo/tools/debugger/DebuggerClient.h
+++ b/neo/tools/debugger/DebuggerClient.h
@@ -28,6 +28,9 @@ If you have questions concerning this license or the applicable additional terms
 #ifndef DEBUGGERCLIENT_H_
 #define DEBUGGERCLIENT_H_
 
+#include "DebuggerBreakpoint.h"
+#include "idlib/containers/StrList.h"
+
 class rvDebuggerCallstack
 {
 public:
@@ -49,9 +52,6 @@ public:
 	bool	mDoneProcessing;
 };
 
-#ifndef DEBUGGERBREAKPOINT_H_
-#include "DebuggerBreakpoint.h"
-#endif
 
 typedef idList<rvDebuggerCallstack*>	rvDebuggerCallstackList;
 typedef idList<rvDebuggerThread*>		rvDebuggerThreadList;
@@ -75,19 +75,23 @@ public:
 	int							GetActiveBreakpointID	( void );
 	const char*					GetBreakFilename		( void );
 	int							GetBreakLineNumber		( void );
+	idProgram*					GetBreakProgram			( void );
 	rvDebuggerCallstackList&	GetCallstack			( void );
 	rvDebuggerThreadList&		GetThreads				( void );
 	const char*					GetVariableValue		( const char* name, int stackDepth );
+	idStrList&					GetServerScripts		( void );
 
 	void						InspectVariable			( const char* name, int callstackDepth );
-
+	void						InspectScripts			( void );
 	void						Break					( void );
 	void						Resume					( void );
 	void						StepInto				( void );
 	void						StepOver				( void );
 
+	void						SendCommand				( const char* cmdStr );
+
 	// Breakpoints
-	int							AddBreakpoint			( const char* filename, int lineNumber, bool onceOnly = false );
+	int							AddBreakpoint			( const char* filename, int lineNumber, bool onceOnly = false);
 	bool						RemoveBreakpoint		( int bpID );
 	void						ClearBreakpoints		( void );
 	int							GetBreakpointCount		( void );
@@ -98,7 +102,7 @@ protected:
 
 	void						SendMessage				( EDebuggerMessage dbmsg );
 	void						SendBreakpoints			( void );
-	void						SendAddBreakpoint		( rvDebuggerBreakpoint& bp, bool onceOnly = false );
+	void						SendAddBreakpoint		( rvDebuggerBreakpoint& bp );
 	void						SendRemoveBreakpoint	( rvDebuggerBreakpoint& bp );
 	void						SendPacket				( void* data, int datasize );
 
@@ -119,6 +123,8 @@ protected:
 
 	EDebuggerMessage			mWaitFor;
 
+	idStrList					mServerScripts;
+
 private:
 
 	void		ClearCallstack				( void );
@@ -127,10 +133,13 @@ private:
 	void		UpdateWatches				( void );
 
 	// Network message handlers
-	void		HandleBreak					( msg_t* msg );
-	void		HandleInspectCallstack		( msg_t* msg );
-	void		HandleInspectThreads		( msg_t* msg );
-	void		HandleInspectVariable		( msg_t* msg );
+	void		HandleBreak					( idBitMsg* msg );
+	void		HandleInspectScripts		( idBitMsg* msg );
+	void		HandleInspectCallstack		( idBitMsg* msg );
+	void		HandleInspectThreads		( idBitMsg* msg );
+	void		HandleInspectVariable		( idBitMsg* msg );
+	void		HandleGameDLLHandle			( idBitMsg* msg );
+	void		HandleRemoveBreakpoint		( idBitMsg* msg );
 };
 
 /*
@@ -286,4 +295,14 @@ ID_INLINE void rvDebuggerClient::SendPacket ( void* data, int size )
 	mPort.SendPacket ( mServerAdr, data, size );
 }
 
+
+/*
+================
+rvDebuggerClient::GetServerScripts
+================
+*/
+ID_INLINE idStrList& rvDebuggerClient::GetServerScripts( void )
+{
+	return mServerScripts;
+}
 #endif // DEBUGGERCLIENT_H_
diff --git a/neo/tools/debugger/DebuggerFindDlg.cpp b/neo/tools/debugger/DebuggerFindDlg.cpp
index 59ce96c..cf5dd28 100644
--- a/neo/tools/debugger/DebuggerFindDlg.cpp
+++ b/neo/tools/debugger/DebuggerFindDlg.cpp
@@ -53,7 +53,7 @@ Launch the dialog
 */
 bool rvDebuggerFindDlg::DoModal ( rvDebuggerWindow* parent )
 {
-	if ( DialogBoxParam ( parent->GetInstance(), MAKEINTRESOURCE(IDD_DBG_FIND), parent->GetWindow(), DlgProc, (LONG)this ) )
+	if ( DialogBoxParam ( parent->GetInstance(), MAKEINTRESOURCE(IDD_DBG_FIND), parent->GetWindow(), DlgProc, (LPARAM)this ) )
 	{
 		return true;
 	}
@@ -70,7 +70,7 @@ Dialog Procedure for the find dialog
 */
 INT_PTR CALLBACK rvDebuggerFindDlg::DlgProc ( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam )
 {
-	rvDebuggerFindDlg* dlg = (rvDebuggerFindDlg*) GetWindowLong ( wnd, GWL_USERDATA );
+	rvDebuggerFindDlg* dlg = (rvDebuggerFindDlg*) GetWindowLongPtr ( wnd, GWLP_USERDATA);
 
 	switch ( msg )
 	{
@@ -80,7 +80,8 @@ INT_PTR CALLBACK rvDebuggerFindDlg::DlgProc ( HWND wnd, UINT msg, WPARAM wparam,
 
 		case WM_INITDIALOG:
 			dlg = (rvDebuggerFindDlg*) lparam;
-			SetWindowLong ( wnd, GWL_USERDATA, (LONG) dlg );
+
+			SetWindowLongPtr ( wnd, GWLP_USERDATA, (LONG_PTR) dlg );
 			dlg->mWnd = wnd;
 			SetWindowText ( GetDlgItem ( dlg->mWnd, IDC_DBG_FIND ), dlg->mFindText );
 			return TRUE;
diff --git a/neo/tools/debugger/DebuggerMessages.h b/neo/tools/debugger/DebuggerMessages.h
index 6e0cba4..91d8b43 100644
--- a/neo/tools/debugger/DebuggerMessages.h
+++ b/neo/tools/debugger/DebuggerMessages.h
@@ -46,6 +46,8 @@ enum EDebuggerMessage
 	DBMSG_INSPECTTHREADS,
 	DBMSG_STEPOVER,
 	DBMSG_STEPINTO,
+	DBMSG_INSPECTSCRIPTS,
+	DBMSG_EXECCOMMAND
 };
 
 #endif // DEBUGGER_MESSAGES_H_
\ No newline at end of file
diff --git a/neo/tools/debugger/DebuggerQuickWatchDlg.cpp b/neo/tools/debugger/DebuggerQuickWatchDlg.cpp
index 7cf18fe..cee73a8 100644
--- a/neo/tools/debugger/DebuggerQuickWatchDlg.cpp
+++ b/neo/tools/debugger/DebuggerQuickWatchDlg.cpp
@@ -55,7 +55,7 @@ bool rvDebuggerQuickWatchDlg::DoModal ( rvDebuggerWindow* window, int callstackD
 	mDebuggerWindow = window;
 	mVariable       = variable?variable:"";
 
-	DialogBoxParam ( window->GetInstance(), MAKEINTRESOURCE(IDD_DBG_QUICKWATCH), window->GetWindow(), DlgProc, (LONG)this );
+	DialogBoxParam ( window->GetInstance(), MAKEINTRESOURCE(IDD_DBG_QUICKWATCH), window->GetWindow(), DlgProc, (LPARAM)this );
 
 	return true;
 }
@@ -69,7 +69,7 @@ Dialog Procedure for the quick watch dialog
 */
 INT_PTR CALLBACK rvDebuggerQuickWatchDlg::DlgProc ( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam )
 {
-	rvDebuggerQuickWatchDlg* dlg = (rvDebuggerQuickWatchDlg*) GetWindowLong ( wnd, GWL_USERDATA );
+	rvDebuggerQuickWatchDlg* dlg = (rvDebuggerQuickWatchDlg*) GetWindowLongPtr ( wnd, GWLP_USERDATA);
 
 	switch ( msg )
 	{
@@ -128,7 +128,7 @@ INT_PTR CALLBACK rvDebuggerQuickWatchDlg::DlgProc ( HWND wnd, UINT msg, WPARAM w
 
 			// Attach the dialog class pointer to the window
 			dlg = (rvDebuggerQuickWatchDlg*) lparam;
-			SetWindowLong ( wnd, GWL_USERDATA, lparam );
+			SetWindowLongPtr ( wnd, GWLP_USERDATA, lparam );
 			dlg->mWnd = wnd;
 
 			GetClientRect ( wnd, &client );
diff --git a/neo/tools/debugger/DebuggerScript.cpp b/neo/tools/debugger/DebuggerScript.cpp
index fc15dfa..f1d6944 100644
--- a/neo/tools/debugger/DebuggerScript.cpp
+++ b/neo/tools/debugger/DebuggerScript.cpp
@@ -26,12 +26,14 @@ If you have questions concerning this license or the applicable additional terms
 ===========================================================================
 */
 
+#if defined( ID_ALLOW_TOOLS )
 #include "tools/edit_gui_common.h"
-
-
 #include "DebuggerApp.h"
+#else
+#include "debugger_common.h"
+#endif
+
 #include "DebuggerScript.h"
-#include "../../game/script/Script_Program.h"
 #include "../../ui/Window.h"
 #include "../../ui/UserInterfaceLocal.h"
 
@@ -57,6 +59,7 @@ rvDebuggerScript::~rvDebuggerScript ( void )
 	Unload ( );
 }
 
+
 /*
 ================
 rvDebuggerScript::Unload
@@ -72,10 +75,6 @@ void rvDebuggerScript::Unload ( void )
 	{
 		delete mInterface;
 	}
-	else
-	{
-		delete mProgram;
-	}
 
 	mContents  = NULL;
 	mProgram   = NULL;
@@ -116,60 +115,7 @@ bool rvDebuggerScript::Load ( const char* filename )
 
 	// Cleanup
 	fileSystem->FreeFile ( buffer );
-
-	// Now compile the script so we can tell what a valid line is, etc..  If its
-	// a gui file then we need to parse it using the userinterface system rather
-	// than the normal script compiler.
-	try
-	{
-		// Parse the script using the script compiler
-		mProgram = new idProgram;
-		mProgram->BeginCompilation ( );
-		mProgram->CompileFile ( SCRIPT_DEFAULT );
-
-		//BSM Nerve: Loads a game specific main script file
-		idStr gamedir = cvarSystem->GetCVarString( "fs_game" );
-		if(gamedir.Length() > 0) {
-
-			idStr scriptFile = va("script/%s_main.script", gamedir.c_str());
-			if(fileSystem->ReadFile(scriptFile.c_str(), NULL) > 0) {
-				mProgram.CompileFile(scriptFile.c_str());
-			}
-
-		}
-
-		// Make sure the file isnt already compiled before trying to compile it again
-		for ( int f = mProgram->NumFilenames() - 1; f >= 0; f -- )
-		{
-			idStr qpath;
-			qpath = fileSystem->OSPathToRelativePath ( mProgram->GetFilename ( f ) );
-			qpath.BackSlashesToSlashes ( );
-			if ( !qpath.Cmp ( filename ) )
-			{
-				break;
-			}
-		}
-
-		if ( f < 0 )
-		{
-			mProgram->CompileText ( filename, mContents, false );
-		}
-
-		mProgram->FinishCompilation ( );
-	}
-	catch ( idException& )
-	{
-		// Failed to parse the script so fail to load the file
-		delete mProgram;
-		mProgram = NULL;
-		delete[] mContents;
-		mContents = NULL;
-
-		// TODO: Should cache the error for the dialog box
-
-		return false;
-	}
-
+	
 	return true;
 }
 
@@ -194,21 +140,8 @@ Determines whether or not the given line number within the script is a valid lin
 */
 bool rvDebuggerScript::IsLineCode ( int linenumber )
 {
-	int i;
-
-	assert ( mProgram );
-
-	// Run through all the statements in the program and see if any match the
-	// linenumber that we are checking.
-	for ( i	= 0; i < mProgram->NumStatements ( ); i ++ )
-	{
-		if ( mProgram->GetStatement ( i ).linenumber == linenumber )
-		{
-			return true;
-		}
-	}
-
-	return false;
+	//we let server decide.
+	return true;
 }
 
 /*
diff --git a/neo/tools/debugger/DebuggerScript.h b/neo/tools/debugger/DebuggerScript.h
index d60086a..ae090f8 100644
--- a/neo/tools/debugger/DebuggerScript.h
+++ b/neo/tools/debugger/DebuggerScript.h
@@ -43,21 +43,22 @@ public:
 
 	const char*		GetFilename		( void );
 	const char*		GetContents		( void );
-
+	idProgram*		GetProgram		( void );
+#if 0// Test code
 	idProgram&		GetProgram		( void );
+#endif
 
 	bool			IsLineCode		( int linenumber );
 	bool			IsFileModified	( bool updateTime = false );
 
 protected:
-
 	void			Unload			( void );
 
 	idProgram*				mProgram;
 	idUserInterfaceLocal*	mInterface;
 	char*					mContents;
 	idStr					mFilename;
-	ID_TIME_T					mModifiedTime;
+	ID_TIME_T				mModifiedTime;
 };
 
 ID_INLINE const char* rvDebuggerScript::GetFilename	( void )
@@ -70,9 +71,10 @@ ID_INLINE const char* rvDebuggerScript::GetContents	( void )
 	return mContents?mContents:"";
 }
 
-ID_INLINE idProgram& rvDebuggerScript::GetProgram ( void )
+ID_INLINE idProgram* rvDebuggerScript::GetProgram ( void )
 {
-	return *mProgram;
+	return mProgram;
 }
 
+
 #endif // DEBUGGERSCRIPT_H_
\ No newline at end of file
diff --git a/neo/tools/debugger/DebuggerServer.cpp b/neo/tools/debugger/DebuggerServer.cpp
index 9606b19..2cfac7a 100644
--- a/neo/tools/debugger/DebuggerServer.cpp
+++ b/neo/tools/debugger/DebuggerServer.cpp
@@ -3,6 +3,8 @@
 
 Doom 3 GPL Source Code
 Copyright (C) 1999-2011 id Software LLC, a ZeniMax Media company.
+Copyright (C) 1999-2011 Raven Software
+Copyright (C) 2021 Harrie van Ginneken
 
 This file is part of the Doom 3 GPL Source Code ("Doom 3 Source Code").
 
@@ -26,17 +28,16 @@ If you have questions concerning this license or the applicable additional terms
 ===========================================================================
 */
 
+#if defined( ID_ALLOW_TOOLS )
 #include "tools/edit_gui_common.h"
+#include "DebuggerApp.h"
+#else
+#include "debugger_common.h"
+// we need a lot to be able to list all threads in mars_city1
+const int MAX_MSGLEN = 8600;
+#endif
 
 
-#include "../../game/gamesys/Event.h"
-#include "../../game/gamesys/Class.h"
-#include "../../game/script/Script_Program.h"
-#include "../../game/script/Script_Interpreter.h"
-#include "../../game/script/Script_Thread.h"
-#include "../../game/script/Script_Compiler.h"
-#include "../../framework/sync/Msg.h"
-#include "DebuggerApp.h"
 #include "DebuggerServer.h"
 
 /*
@@ -51,10 +52,17 @@ rvDebuggerServer::rvDebuggerServer ( )
 	mBreak				= false;
 	mBreakStepOver		= false;
 	mBreakStepInto		= false;
-	mGameThread			= NULL;
+	mGameThreadBreakCond = NULL;
+	mGameThreadBreakLock = NULL;
 	mLastStatementLine	= -1;
 	mBreakStepOverFunc1 = NULL;
 	mBreakStepOverFunc2 = NULL;
+	mBreakInstructionPointer = 0;
+	mBreakInterpreter = NULL;
+	mBreakProgram = NULL;
+	mGameDLLHandle = 0;
+	mBreakStepOverDepth = 0;
+	mCriticalSection = NULL;
 }
 
 /*
@@ -82,15 +90,17 @@ bool rvDebuggerServer::Initialize ( void )
 		return false;
 	}
 
-	// Get a copy of the game thread handle so we can suspend the thread on a break
-	DuplicateHandle ( GetCurrentProcess(), GetCurrentThread ( ), GetCurrentProcess(), &mGameThread, 0, FALSE, DUPLICATE_SAME_ACCESS );
+	// we're using a condition variable to pause the game thread in rbDebuggerServer::Break()
+	// until rvDebuggerServer::Resume() is called (from another thread)
+	mGameThreadBreakCond = SDL_CreateCond();
+	mGameThreadBreakLock = SDL_CreateMutex();
 
 	// Create a critical section to ensure that the shared thread
 	// variables are protected
-	InitializeCriticalSection ( &mCriticalSection );
+	mCriticalSection = SDL_CreateMutex();
 
 	// Server must be running on the local host on port 28980
-	Sys_StringToNetAdr ( "localhost", &mClientAdr, true );
+	Sys_StringToNetAdr ( com_dbgClientAdr.GetString( ), &mClientAdr, true );
 	mClientAdr.port = 27981;
 
 	// Attempt to let the server know we are here.  The server may not be running so this
@@ -102,7 +112,7 @@ bool rvDebuggerServer::Initialize ( void )
 
 void rvDebuggerServer::OSPathToRelativePath( const char *osPath, idStr &qpath )
 {
-	if ( strchr( osPath, ':' ) )
+	if ( strchr( osPath, ':' ) ) // XXX: what about linux?
 	{
 		qpath = fileSystem->OSPathToRelativePath( osPath );
 	}
@@ -130,8 +140,16 @@ void rvDebuggerServer::Shutdown ( void )
 
 	mPort.Close();
 
+	Resume(); // just in case we're still paused
+
 	// dont need the crit section anymore
-	DeleteCriticalSection ( &mCriticalSection );
+	SDL_DestroyMutex( mCriticalSection );
+	mCriticalSection = NULL;
+
+	SDL_DestroyCond( mGameThreadBreakCond );
+	mGameThreadBreakCond = NULL;
+	SDL_DestroyMutex( mGameThreadBreakLock );
+	mGameThreadBreakLock = NULL;
 }
 
 /*
@@ -144,39 +162,46 @@ Process all incoming network messages from the debugger client
 bool rvDebuggerServer::ProcessMessages ( void )
 {
 	netadr_t adrFrom;
-	msg_t	 msg;
+	idBitMsg	 msg;
 	byte	 buffer[MAX_MSGLEN];
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-
 	// Check for pending udp packets on the debugger port
-	while ( mPort.GetPacket ( adrFrom, msg.data, msg.cursize, msg.maxsize ) )
+	int msgSize;
+	while ( mPort.GetPacket ( adrFrom, buffer, msgSize, MAX_MSGLEN) )
 	{
-		unsigned short command;
-
-		// Only accept packets from the debugger server for security reasons
-		if ( !Sys_CompareNetAdrBase ( adrFrom, mClientAdr ) )
-		{
-			continue;
+		short command;
+		msg.Init(buffer, sizeof(buffer));
+		msg.SetSize(msgSize);
+		msg.BeginReading();
+		
+		if ( adrFrom.type != NA_LOOPBACK ) {
+			// Only accept packets from the debugger server for security reasons
+			if ( !Sys_CompareNetAdrBase( adrFrom, mClientAdr ) )
+				continue;
 		}
 
-		command = (unsigned short) MSG_ReadShort ( &msg );
+		command = msg.ReadShort( );
 
 		switch ( command )
 		{
 			case DBMSG_CONNECT:
 				mConnected = true;
 				SendMessage ( DBMSG_CONNECTED );
+				HandleInspectScripts ( NULL );
+				com_editors |= EDITOR_DEBUGGER;
 				break;
 
 			case DBMSG_CONNECTED:
 				mConnected = true;
+				HandleInspectScripts( NULL );
+				com_editors |= EDITOR_DEBUGGER;
 				break;
 
 			case DBMSG_DISCONNECT:
 				ClearBreakpoints ( );
 				Resume ( );
 				mConnected = false;
+				com_editors &= ~EDITOR_DEBUGGER;
 				break;
 
 			case DBMSG_ADDBREAKPOINT:
@@ -188,7 +213,7 @@ bool rvDebuggerServer::ProcessMessages ( void )
 				break;
 
 			case DBMSG_RESUME:
-				Resume ( );
+				HandleResume ( &msg );
 				break;
 
 			case DBMSG_BREAK:
@@ -197,11 +222,11 @@ bool rvDebuggerServer::ProcessMessages ( void )
 
 			case DBMSG_STEPOVER:
 				mBreakStepOver = true;
-				mBreakStepOverDepth = mBreakInterpreter->GetCallstackDepth ( );
-				mBreakStepOverFunc1 = mBreakInterpreter->GetCallstack()[mBreakInterpreter->GetCallstackDepth()].f;
-				if ( mBreakInterpreter->GetCallstackDepth() > 0 )
+				mBreakStepOverDepth = ((idGameEditExt*) gameEdit)->GetInterpreterCallStackDepth(mBreakInterpreter);
+				mBreakStepOverFunc1 = ((idGameEditExt*) gameEdit)->GetInterpreterCallStackFunction(mBreakInterpreter);
+				if (mBreakStepOverDepth)
 				{
-					mBreakStepOverFunc2 = mBreakInterpreter->GetCallstack()[mBreakInterpreter->GetCallstackDepth()-1].f;
+					mBreakStepOverFunc2 = ((idGameEditExt*) gameEdit)->GetInterpreterCallStackFunction(mBreakInterpreter,mBreakStepOverDepth - 1);
 				}
 				else
 				{
@@ -226,6 +251,14 @@ bool rvDebuggerServer::ProcessMessages ( void )
 			case DBMSG_INSPECTTHREADS:
 				HandleInspectThreads ( &msg );
 				break;
+
+			case DBMSG_INSPECTSCRIPTS:
+				HandleInspectScripts( &msg );
+				break;
+
+			case DBMSG_EXECCOMMAND:
+				HandleExecCommand( &msg );
+				break;
 		}
 	}
 
@@ -241,13 +274,14 @@ Send a message with no data to the debugger server.
 */
 void rvDebuggerServer::SendMessage ( EDebuggerMessage dbmsg )
 {
-	msg_t	 msg;
+	idBitMsg	 msg;
 	byte	 buffer[MAX_MSGLEN];
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)dbmsg );
+	msg.Init( buffer, sizeof( buffer ) );
+	msg.BeginWriting();
+	msg.WriteShort ( (short)dbmsg );
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket ( msg.GetData(), msg.GetSize() );
 }
 
 /*
@@ -255,29 +289,44 @@ void rvDebuggerServer::SendMessage ( EDebuggerMessage dbmsg )
 rvDebuggerServer::HandleAddBreakpoint
 
 Handle the DBMSG_ADDBREAKPOINT message being sent by the debugger client.  This
-message is handled by adding a new breakpoint to the breakpoint list with the
+message is handled by first checking if it is valid
+and is added as a new breakpoint to the breakpoint list with the
 data supplied in the message.
 ================
 */
-void rvDebuggerServer::HandleAddBreakpoint ( msg_t* msg )
+void rvDebuggerServer::HandleAddBreakpoint ( idBitMsg* msg )
 {
 	bool onceOnly = false;
 	long lineNumber;
 	long id;
-	char filename[MAX_PATH];
+	char filename[2048]; // DG: randomly chose this size
 
 	// Read the breakpoint info
-	onceOnly   = MSG_ReadBits ( msg, 1 ) ? true : false;
-	lineNumber = MSG_ReadInt ( msg );
-	id		   = MSG_ReadInt ( msg );
+	onceOnly = msg->ReadBits( 1 ) ? true : false;
+	lineNumber = msg->ReadInt ( );
+	id		   = msg->ReadInt ( );
 
-	MSG_ReadString ( msg, filename, MAX_PATH );
+	msg->ReadString ( filename, sizeof(filename) );
 
-	// Since breakpoints are used by both threads we need to
-	// protect them with a crit section
-	EnterCriticalSection ( &mCriticalSection );
-	mBreakpoints.Append ( new rvDebuggerBreakpoint ( filename, lineNumber, id ) );
-	LeaveCriticalSection ( &mCriticalSection );
+	//check for statement on requested breakpoint location 
+	if (!((idGameEditExt*) gameEdit)->IsLineCode(filename, lineNumber))
+	{
+		idBitMsg	msgOut;
+		byte		buffer[MAX_MSGLEN];
+
+		msgOut.Init(buffer, sizeof(buffer));
+		msgOut.BeginWriting();
+		msgOut.WriteShort((short)DBMSG_REMOVEBREAKPOINT);
+		msgOut.WriteInt(lineNumber);
+		msgOut.WriteString(filename);
+		SendPacket(msgOut.GetData(), msgOut.GetSize());
+		return;
+	}
+
+
+	SDL_LockMutex( mCriticalSection );
+	mBreakpoints.Append ( new rvDebuggerBreakpoint ( filename, lineNumber, id, onceOnly ) );
+	SDL_UnlockMutex( mCriticalSection );
 }
 
 /*
@@ -289,17 +338,17 @@ message is handled by removing the breakpoint that matches the given id from the
 list.
 ================
 */
-void rvDebuggerServer::HandleRemoveBreakpoint ( msg_t* msg )
+void rvDebuggerServer::HandleRemoveBreakpoint ( idBitMsg* msg )
 {
 	int i;
 	int id;
 
 	// ID that we are to remove
-	id = MSG_ReadInt ( msg );
+	id = msg->ReadInt ( );
 
 	// Since breakpoints are used by both threads we need to
 	// protect them with a crit section
-	EnterCriticalSection ( &mCriticalSection );
+	SDL_LockMutex( mCriticalSection );
 
 	// Find the breakpoint that matches the given id and remove it from the list
 	for ( i = 0; i < mBreakpoints.Num(); i ++ )
@@ -312,52 +361,21 @@ void rvDebuggerServer::HandleRemoveBreakpoint ( msg_t* msg )
 		}
 	}
 
-	LeaveCriticalSection ( &mCriticalSection );
+	SDL_UnlockMutex( mCriticalSection );
 }
 
 /*
 ================
-rvDebuggerServer::MSG_WriteCallstackFunc
+rvDebuggerServer::HandleResume
 
-Writes a single callstack entry to the given message
+Resume the game thread.
 ================
+
 */
-void rvDebuggerServer::MSG_WriteCallstackFunc ( msg_t* msg, const prstack_t* stack )
+void rvDebuggerServer::HandleResume(idBitMsg* msg)
 {
-	const statement_t*	st;
-	const function_t*	func;
-
-	func  = stack->f;
-
-	// If the function is unknown then just fill in with default data.
-	if ( !func )
-	{
-		MSG_WriteString ( msg, "<UNKNOWN>" );
-		MSG_WriteString ( msg, "<UNKNOWN>" );
-		MSG_WriteInt ( msg, 0 );
-		return;
-	}
-	else
-	{
-		MSG_WriteString ( msg, va("%s( ??? )", func->Name() ) );
-	}
-
-	// Use the calling statement as the filename and linenumber where
-	// the call was made from
-	st = &mBreakProgram->GetStatement ( stack->s );
-	if ( st )
-	{
-		idStr qpath;
-		OSPathToRelativePath(mBreakProgram->GetFilename( st->file ), qpath);
-		qpath.BackSlashesToSlashes ( );
-		MSG_WriteString ( msg, qpath );
-		MSG_WriteInt ( msg, st->linenumber );
-	}
-	else
-	{
-		MSG_WriteString ( msg, "<UNKNOWN>" );
-		MSG_WriteInt ( msg, 0 );
-	}
+	//Empty msg
+	Resume();
 }
 
 /*
@@ -368,31 +386,18 @@ Handle an incoming inspect callstack message by sending a message
 back to the client with the callstack data.
 ================
 */
-void rvDebuggerServer::HandleInspectCallstack ( msg_t* in_msg )
+void rvDebuggerServer::HandleInspectCallstack ( idBitMsg* msg )
 {
-	msg_t		 msg;
+	idBitMsg	 msgOut;
 	byte		 buffer[MAX_MSGLEN];
-	int			 i;
-	prstack_t	 temp;
-
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_INSPECTCALLSTACK );
 
-	MSG_WriteShort ( &msg, (int)mBreakInterpreter->GetCallstackDepth ( ) );
+	msgOut.Init(buffer, sizeof( buffer ) );
+	msgOut.BeginWriting();
+	msgOut.WriteShort ( (short)DBMSG_INSPECTCALLSTACK );
 
-	// write out the current function
-	temp.f = mBreakInterpreter->GetCurrentFunction ( );
-	temp.s = 0;
-	temp.stackbase = 0;
-	MSG_WriteCallstackFunc ( &msg, &temp );
-
-	// Run through all of the callstack and write each to the msg
-	for ( i = mBreakInterpreter->GetCallstackDepth ( ) - 1; i > 0; i -- )
-	{
-		MSG_WriteCallstackFunc ( &msg, mBreakInterpreter->GetCallstack ( ) + i );
-	}
+	((idGameEditExt*) gameEdit)->MSG_WriteInterpreterInfo(&msgOut, mBreakInterpreter, mBreakProgram, mBreakInstructionPointer);
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket (msgOut.GetData(), msgOut.GetSize() );
 }
 
 /*
@@ -402,35 +407,67 @@ rvDebuggerServer::HandleInspectThreads
 Send the list of the current threads in the interpreter back to the debugger client
 ================
 */
-void rvDebuggerServer::HandleInspectThreads ( msg_t* in_msg )
+void rvDebuggerServer::HandleInspectThreads ( idBitMsg* msg )
 {
-	msg_t		 msg;
-	byte		 buffer[MAX_MSGLEN];
-	int			 i;
+	idBitMsg	msgOut;
+	byte		buffer[MAX_MSGLEN];
+	int			i;
 
 	// Initialize the message
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_INSPECTTHREADS );
+	msgOut.Init( buffer, sizeof( buffer ) );
+	msgOut.SetAllowOverflow(true);
+	msgOut.BeginWriting();
+	msgOut.WriteShort ( (short)DBMSG_INSPECTTHREADS );
 
 	// Write the number of threads to the message
-	MSG_WriteShort ( &msg, (int)idThread::GetThreads().Num() );
+	msgOut.WriteShort ((short)((idGameEditExt*) gameEdit)->GetTotalScriptThreads() );
 
 	// Loop through all of the threads and write their name and number to the message
-	for ( i = 0; i < idThread::GetThreads().Num(); i ++ )
+	for ( i = 0; i < ((idGameEditExt*) gameEdit)->GetTotalScriptThreads(); i ++ )
 	{
-		idThread* thread = idThread::GetThreads()[i];
-
-		MSG_WriteString ( &msg, thread->GetThreadName ( ) );
-		MSG_WriteInt ( &msg, thread->GetThreadNum ( ) );
-
-		MSG_WriteBits ( &msg, (int)(thread == mBreakInterpreter->GetThread ( )), 1 );
-		MSG_WriteBits ( &msg, (int)thread->IsDoneProcessing(), 1 );
-		MSG_WriteBits ( &msg, (int)thread->IsWaiting(), 1 );
-		MSG_WriteBits ( &msg, (int)thread->IsDying(), 1 );
+		((idGameEditExt*) gameEdit)->MSG_WriteThreadInfo(&msgOut,((idGameEditExt*) gameEdit)->GetThreadByIndex(i), mBreakInterpreter);
 	}
 
 	// Send off the inspect threads packet to the debugger client
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket (msgOut.GetData(), msgOut.GetSize() );
+}
+
+/*
+================
+rvDebuggerServer::HandleExecCommand
+
+Send the list of the current loaded scripts in the interpreter back to the debugger client
+================
+*/
+void rvDebuggerServer::HandleExecCommand( idBitMsg *msg ) {
+	char cmdStr[2048]; // HvG: randomly chose this size
+
+	msg->ReadString( cmdStr, sizeof( cmdStr ) );
+	cmdSystem->BufferCommandText( CMD_EXEC_APPEND, cmdStr );	// valid command
+	cmdSystem->BufferCommandText( CMD_EXEC_APPEND, "\n" );
+}
+
+
+/*
+================
+rvDebuggerServer::HandleInspectScripts
+
+Send the list of the current loaded scripts in the interpreter back to the debugger client
+================
+*/
+void rvDebuggerServer::HandleInspectScripts( idBitMsg* msg )
+{
+	idBitMsg	 msgOut;
+	byte		 buffer[MAX_MSGLEN];
+
+	// Initialize the message
+	msgOut.Init(buffer, sizeof(buffer));
+	msgOut.BeginWriting();
+	msgOut.WriteShort((short)DBMSG_INSPECTSCRIPTS);
+
+	((idGameEditExt*) gameEdit)->MSG_WriteScriptList( &msgOut );
+
+	SendPacket(msgOut.GetData(), msgOut.GetSize());
 }
 
 /*
@@ -440,7 +477,7 @@ rvDebuggerServer::HandleInspectVariable
 Respondes to a request from the debugger client to inspect the value of a given variable
 ================
 */
-void rvDebuggerServer::HandleInspectVariable ( msg_t* in_msg )
+void rvDebuggerServer::HandleInspectVariable ( idBitMsg* msg )
 {
 	char varname[256];
 	int  scopeDepth;
@@ -450,28 +487,29 @@ void rvDebuggerServer::HandleInspectVariable ( msg_t* in_msg )
 		return;
 	}
 
-	scopeDepth = (short)MSG_ReadShort ( in_msg );
-	MSG_ReadString ( in_msg, varname, 256 );
+	scopeDepth = (short)msg->ReadShort ( );
+	msg->ReadString ( varname, 256 );
 
 	idStr varvalue;
 
-	msg_t		 msg;
+	idBitMsg	 msgOut;
 	byte		 buffer[MAX_MSGLEN];
 
 	// Initialize the message
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_INSPECTVARIABLE );
+	msgOut.Init( buffer, sizeof( buffer ) );
+	msgOut.BeginWriting();
+	msgOut.WriteShort ( (short)DBMSG_INSPECTVARIABLE );
 
-	if ( !mBreakInterpreter->GetRegisterValue ( varname, varvalue, scopeDepth ) )
+	if (!((idGameEditExt*) gameEdit)->GetRegisterValue(mBreakInterpreter, varname, varvalue, scopeDepth ) )
 	{
 		varvalue = "???";
 	}
 
-	MSG_WriteShort ( &msg, (short)scopeDepth );
-	MSG_WriteString ( &msg, varname );
-	MSG_WriteString ( &msg, varvalue );
+	msgOut.WriteShort ( (short)scopeDepth );
+	msgOut.WriteString ( varname );
+	msgOut.WriteString ( varvalue );
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket (msgOut.GetData(), msgOut.GetSize() );
 }
 
 /*
@@ -484,7 +522,6 @@ Check to see if any breakpoints have been hit.  This includes "break next",
 */
 void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram* program, int instructionPointer )
 {
-	const statement_t*	st;
 	const char*			filename;
 	int					i;
 
@@ -492,23 +529,24 @@ void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram*
 		return;
 	}
 
+	
 	// Grab the current statement and the filename that it came from
-	st       = &program->GetStatement ( instructionPointer );
-	filename = program->GetFilename ( st->file );
+	filename = ((idGameEditExt*) gameEdit)->GetFilenameForStatement(program, instructionPointer);
+	int linenumber = ((idGameEditExt*) gameEdit)->GetLineNumberForStatement(program, instructionPointer);
 
 	// Operate on lines, not statements
-	if ( mLastStatementLine == st->linenumber && mLastStatementFile == st->file )
+	if ( mLastStatementLine == linenumber && mLastStatementFile == filename)
 	{
 		return;
 	}
-
+	
 	// Save the last visited line and file so we can prevent
 	// double breaks on lines with more than one statement
-	mLastStatementFile = idStr( st->file );
-	mLastStatementLine = st->linenumber;
+	mLastStatementFile = idStr(filename);
+	mLastStatementLine = linenumber;
 
 	// Reset stepping when the last function on the callstack is returned from
-	if ( st->op == OP_RETURN && interpreter->GetCallstackDepth ( ) <= 1 )
+	if ( ((idGameEditExt*) gameEdit)->ReturnedFromFunction(program, interpreter,instructionPointer))
 	{
 		mBreakStepOver = false;
 		mBreakStepInto = false;
@@ -517,6 +555,7 @@ void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram*
 	// See if we are supposed to break on the next script line
 	if ( mBreakNext )
 	{
+		HandleInspectScripts(NULL);
 		Break ( interpreter, program, instructionPointer );
 		return;
 	}
@@ -524,9 +563,8 @@ void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram*
 	// Only break on the same callstack depth and thread as the break over
 	if ( mBreakStepOver )
 	{
-		if ( ( interpreter->GetCurrentFunction ( ) == mBreakStepOverFunc1 ||
-			   interpreter->GetCurrentFunction ( ) == mBreakStepOverFunc2    )&&
-			 ( interpreter->GetCallstackDepth ( )  <= mBreakStepOverDepth ) )
+		//virtual bool CheckForBreakpointHit(interpreter,function1,function2,depth)
+		if (((idGameEditExt*) gameEdit)->CheckForBreakPointHit(interpreter, mBreakStepOverFunc1, mBreakStepOverFunc2, mBreakStepOverDepth))
 		{
 			Break ( interpreter, program, instructionPointer );
 			return;
@@ -536,6 +574,7 @@ void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram*
 	// See if we are supposed to break on the next line
 	if ( mBreakStepInto )
 	{
+		HandleInspectScripts(NULL);
 		// Break
 		Break ( interpreter, program, instructionPointer );
 		return;
@@ -545,7 +584,7 @@ void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram*
 	OSPathToRelativePath(filename,qpath);
 	qpath.BackSlashesToSlashes ( );
 
-	EnterCriticalSection ( &mCriticalSection );
+	SDL_LockMutex( mCriticalSection );
 
 	// Check all the breakpoints
 	for ( i = 0; i < mBreakpoints.Num ( ); i ++ )
@@ -553,30 +592,50 @@ void rvDebuggerServer::CheckBreakpoints	( idInterpreter* interpreter, idProgram*
 		rvDebuggerBreakpoint* bp = mBreakpoints[i];
 
 		// Skip if not match of the line number
-		if ( st->linenumber != bp->GetLineNumber ( ) )
+		if ( linenumber != bp->GetLineNumber ( ) )
 		{
 			continue;
 		}
 
 		// Skip if no match of the filename
-		if ( idStr::Icmp ( bp->GetFilename(), qpath ) )
+		if ( idStr::Icmp ( bp->GetFilename(), qpath.c_str() ) )
 		{
 			continue;
 		}
 
+		// DG: onceOnly support
+		if ( bp->GetOnceOnly() ) {
+			// we'll do the one Break() a few lines below; remove it here while mBreakpoints is unmodified
+			// (it can be modifed from the client while in Break() below)
+			mBreakpoints.RemoveIndex( i );
+			delete bp;
+
+			// also tell client to remove the breakpoint
+			idBitMsg	msgOut;
+			byte		buffer[MAX_MSGLEN];
+			msgOut.Init( buffer, sizeof( buffer ) );
+			msgOut.BeginWriting();
+			msgOut.WriteShort( (short)DBMSG_REMOVEBREAKPOINT );
+			msgOut.WriteInt( linenumber );
+			msgOut.WriteString( qpath.c_str() );
+			SendPacket( msgOut.GetData(), msgOut.GetSize() );
+		}
+		// DG end
+
 		// Pop out of the critical section so we dont get stuck
-		LeaveCriticalSection ( &mCriticalSection );
+		SDL_UnlockMutex( mCriticalSection );
 
+		HandleInspectScripts(NULL);
 		// We hit a breakpoint, so break
 		Break ( interpreter, program, instructionPointer );
 
 		// Back into the critical section since we are going to have to leave it
-		EnterCriticalSection ( &mCriticalSection );
+		SDL_LockMutex( mCriticalSection );
 
 		break;
 	}
 
-	LeaveCriticalSection ( &mCriticalSection );
+	SDL_UnlockMutex( mCriticalSection );
 }
 
 /*
@@ -589,9 +648,8 @@ the game has been halted
 */
 void rvDebuggerServer::Break ( idInterpreter* interpreter, idProgram* program, int instructionPointer )
 {
-	msg_t				msg;
+	idBitMsg			msg;
 	byte				buffer[MAX_MSGLEN];
-	const statement_t*	st;
 	const char*			filename;
 
 	// Clear all the break types
@@ -600,12 +658,10 @@ void rvDebuggerServer::Break ( idInterpreter* interpreter, idProgram* program, i
 	mBreakNext     = false;
 
 	// Grab the current statement and the filename that it came from
-	st       = &program->GetStatement ( instructionPointer );
-	filename = program->GetFilename ( st->file );
-
-	idStr qpath;
-	OSPathToRelativePath(filename, qpath);
-	qpath.BackSlashesToSlashes ( );
+	filename = ((idGameEditExt*) gameEdit)->GetFilenameForStatement(program,instructionPointer);
+	int linenumber = ((idGameEditExt*) gameEdit)->GetLineNumberForStatement(program, instructionPointer);
+	idStr fileStr = filename;
+	fileStr.BackSlashesToSlashes();
 
 	// Give the mouse cursor back to the world
 	Sys_GrabMouseCursor( false );
@@ -617,19 +673,33 @@ void rvDebuggerServer::Break ( idInterpreter* interpreter, idProgram* program, i
 	mBreakInstructionPointer = instructionPointer;
 
 	// Inform the debugger of the breakpoint hit
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_BREAK );
-	MSG_WriteInt ( &msg, st->linenumber );
-	MSG_WriteString ( &msg, qpath );
-	SendPacket ( msg.data, msg.cursize );
+	msg.Init( buffer, sizeof( buffer ) );
+	msg.BeginWriting();
+	msg.WriteShort ( (short)DBMSG_BREAK );
+	msg.WriteInt ( linenumber );
+	msg.WriteString ( fileStr.c_str() );
+
+	//msg.WriteInt64( (int64_t)mBreakProgram );
+
+	SendPacket ( msg.GetData(), msg.GetSize() );
 
 	// Suspend the game thread.  Since this will be called from within the main game thread
 	// execution wont return until after the thread is resumed
-	SuspendThread ( mGameThread );
+	// DG: the original code used Win32 SuspendThread() here, but as there is no equivalent
+	//     function in SDL and as this is only called within the main game thread anyway,
+	//     just use a condition variable to put this thread to sleep until Resume() has set mBreak
+	SDL_LockMutex( mGameThreadBreakLock );
+	while ( mBreak ) {
+		SDL_CondWait( mGameThreadBreakCond, mGameThreadBreakLock );
+	}
+	SDL_UnlockMutex( mGameThreadBreakLock );
 
 	// Let the debugger client know that we have started back up again
 	SendMessage ( DBMSG_RESUMED );
 
+	// this should be platform specific
+	// TODO: maybe replace with SDL code? or does it not matter if debugger client runs on another machine?
+#if defined( ID_ALLOW_TOOLS )
 	// This is to give some time between the keypress that
 	// told us to resume and the setforeground window.  Otherwise the quake window
 	// would just flash
@@ -640,8 +710,10 @@ void rvDebuggerServer::Break ( idInterpreter* interpreter, idProgram* program, i
 	SetActiveWindow ( win32.hWnd );
 	UpdateWindow ( win32.hWnd );
 	SetFocus ( win32.hWnd );
+#endif
 
 	// Give the mouse cursor back to the game
+	// HVG_Note : there be dragons here. somewhere.
 	Sys_GrabMouseCursor( true );
 
 	// Clear all commands that were generated before we went into suspended mode.  This is
@@ -664,10 +736,11 @@ void rvDebuggerServer::Resume ( void )
 		return;
 	}
 
-	mBreak = false;
-
 	// Start the game thread back up
-	ResumeThread ( mGameThread );
+	SDL_LockMutex( mGameThreadBreakLock );
+	mBreak = false;
+	SDL_CondSignal( mGameThreadBreakCond);
+	SDL_UnlockMutex( mGameThreadBreakLock );
 }
 
 /*
@@ -703,12 +776,13 @@ void rvDebuggerServer::Print ( const char* text )
 		return;
 	}
 
-	msg_t	 msg;
+	idBitMsg msg;
 	byte	 buffer[MAX_MSGLEN];
 
-	MSG_Init( &msg, buffer, sizeof( buffer ) );
-	MSG_WriteShort ( &msg, (int)DBMSG_PRINT );
-	MSG_WriteString ( &msg, text );
+	msg.Init( buffer, sizeof( buffer ) );
+	msg.BeginWriting();
+	msg.WriteShort ( (short)DBMSG_PRINT );
+	msg.WriteString ( text );
 
-	SendPacket ( msg.data, msg.cursize );
+	SendPacket ( msg.GetData(), msg.GetSize() );
 }
diff --git a/neo/tools/debugger/DebuggerServer.h b/neo/tools/debugger/DebuggerServer.h
index 04b9782..efa100a 100644
--- a/neo/tools/debugger/DebuggerServer.h
+++ b/neo/tools/debugger/DebuggerServer.h
@@ -28,20 +28,16 @@ If you have questions concerning this license or the applicable additional terms
 #ifndef DEBUGGERSERVER_H_
 #define DEBUGGERSERVER_H_
 
-#ifndef DEBUGGERMESSAGES_H_
-#include "DebuggerMessages.h"
-#endif
 
-#ifndef DEBUGGERBREAKPOINT_H_
+#include "DebuggerMessages.h"
 #include "DebuggerBreakpoint.h"
-#endif
+#include "framework/Game.h"
+#include <SDL.h>
+
 
-#ifndef __GAME_LOCAL_H__
-#include "../../game/Game.h"
-#endif
 
-class idInterpreter;
-class idProgram;
+class function_t;
+typedef struct prstack_s prstack_t;
 
 class rvDebuggerServer
 {
@@ -50,31 +46,52 @@ public:
 	rvDebuggerServer ( );
 	~rvDebuggerServer ( );
 
-	bool		Initialize			( void );
-	void		Shutdown			( void );
+	bool		Initialize				( void );
+	void		Shutdown				( void );
+
+	bool		ProcessMessages			( void );
+
+	bool		IsConnected				( void );
+
+	void		CheckBreakpoints		( idInterpreter *interpreter, idProgram *program, int instructionPointer );
 
-	bool		ProcessMessages		( void );
+	void		Print					( const char *text );
 
-	bool		IsConnected			( void );
+	void		OSPathToRelativePath	( const char *osPath, idStr &qpath );
 
-	void		CheckBreakpoints	( idInterpreter* interpreter, idProgram* program, int instructionPointer );
+	bool		GameSuspended			( void );
+private:
+
+	void		ClearBreakpoints		( void );
 
-	void		Print				( const char* text );
+	void		Break					( idInterpreter *interpreter, idProgram *program, int instructionPointer );
+	void		Resume					( void );
 
-	void		OSPathToRelativePath( const char *osPath, idStr &qpath );
+	void		SendMessage				( EDebuggerMessage dbmsg );
+	void		SendPacket				( void* data, int datasize );
 
-protected:
+	// Message handlers
+	void		HandleAddBreakpoint		( idBitMsg *msg );
+	void		HandleRemoveBreakpoint	( idBitMsg *msg );
+	void		HandleResume			( idBitMsg *msg );
+	void		HandleInspectVariable	( idBitMsg *msg );
+	void		HandleInspectCallstack	( idBitMsg *msg );
+	void		HandleInspectThreads	( idBitMsg *msg );
+	void		HandleInspectScripts	( idBitMsg *msg );
+	void		HandleExecCommand		( idBitMsg *msg );
+	////
 
-	// protected member variables
 	bool							mConnected;
 	netadr_t						mClientAdr;
 	idPort							mPort;
 	idList<rvDebuggerBreakpoint*>	mBreakpoints;
-	CRITICAL_SECTION				mCriticalSection;
+	SDL_mutex*						mCriticalSection;
 
-	HANDLE							mGameThread;
 
+	SDL_cond*						mGameThreadBreakCond;
+	SDL_mutex*						mGameThreadBreakLock;
 	bool							mBreak;
+
 	bool							mBreakNext;
 	bool							mBreakStepOver;
 	bool							mBreakStepInto;
@@ -87,27 +104,9 @@ protected:
 
 	idStr							mLastStatementFile;
 	int								mLastStatementLine;
+	uintptr_t						mGameDLLHandle;
+	idStrList						mScriptFileList;
 
-private:
-
-	void		ClearBreakpoints				( void );
-
-	void		Break							( idInterpreter* interpreter, idProgram* program, int instructionPointer );
-	void		Resume							( void );
-
-	void		SendMessage						( EDebuggerMessage dbmsg );
-	void		SendPacket						( void* data, int datasize );
-
-	// Message handlers
-	void		HandleAddBreakpoint				( msg_t* msg );
-	void		HandleRemoveBreakpoint			( msg_t* msg );
-	void		HandleResume					( msg_t* msg );
-	void		HandleInspectVariable			( msg_t* msg );
-	void		HandleInspectCallstack			( msg_t* msg );
-	void		HandleInspectThreads			( msg_t* msg );
-
-	// MSG helper routines
-	void		MSG_WriteCallstackFunc			( msg_t* msg, const prstack_t* stack );
 };
 
 /*
@@ -125,9 +124,19 @@ ID_INLINE bool rvDebuggerServer::IsConnected ( void )
 rvDebuggerServer::SendPacket
 ================
 */
-ID_INLINE void rvDebuggerServer::SendPacket ( void* data, int size )
+ID_INLINE void rvDebuggerServer::SendPacket ( void *data, int size )
 {
 	mPort.SendPacket ( mClientAdr, data, size );
 }
 
+/*
+================
+rvDebuggerServer::GameSuspended
+================
+*/
+ID_INLINE bool rvDebuggerServer::GameSuspended( void )
+{
+	return mBreak;
+}
+
 #endif // DEBUGGERSERVER_H_
diff --git a/neo/tools/debugger/DebuggerWindow.cpp b/neo/tools/debugger/DebuggerWindow.cpp
index fbf33e7..7306423 100644
--- a/neo/tools/debugger/DebuggerWindow.cpp
+++ b/neo/tools/debugger/DebuggerWindow.cpp
@@ -35,7 +35,7 @@ If you have questions concerning this license or the applicable additional terms
 #include "DebuggerQuickWatchDlg.h"
 #include "DebuggerFindDlg.h"
 
-#define DEBUGGERWINDOWCLASS		"QUAKE4_DEBUGGER_WINDOW"
+#define DEBUGGERWINDOWCLASS		"DHEWM3_DEBUGGER_WINDOW"
 #define ID_DBG_WINDOWMIN		18900
 #define ID_DBG_WINDOWMAX		19900
 
@@ -49,6 +49,9 @@ If you have questions concerning this license or the applicable additional terms
 #define IDC_DBG_WATCH			31007
 #define IDC_DBG_THREADS			31008
 #define IDC_DBG_TOOLBAR			31009
+#define IDC_DBG_SCRIPTLIST		31010
+#define IDC_DBG_CONSOLEINPUT	31011
+#define IDC_DBG_BREAKLIST		31012
 
 #define ID_DBG_FILE_MRU1		10000
 
@@ -167,7 +170,7 @@ bool rvDebuggerWindow::Create ( HINSTANCE instance )
 
 	UpdateTitle ( );
 
-	Printf ( "Quake 4 Script Debugger v0.1\n\n" );
+	Printf ( "Dhewm3 Script Debugger v1.1\n\n" );
 
 	ShowWindow ( mWnd, SW_SHOW );
 	UpdateWindow ( mWnd );
@@ -248,7 +251,7 @@ LRESULT CALLBACK rvDebuggerWindow::ScriptWndProc ( HWND wnd, UINT msg, WPARAM wp
 {
 	static int		  lastStart = -1;
 	static int		  lastEnd   = -1;
-	rvDebuggerWindow* window    = (rvDebuggerWindow*)GetWindowLong ( wnd, GWL_USERDATA );
+	rvDebuggerWindow* window    = (rvDebuggerWindow*)GetWindowLongPtr ( wnd, GWLP_USERDATA );
 	WNDPROC			  wndproc   = window->mOldScriptProc;
 
 	switch ( msg )
@@ -347,6 +350,23 @@ LRESULT CALLBACK rvDebuggerWindow::ScriptWndProc ( HWND wnd, UINT msg, WPARAM wp
 
 			break;
 		}
+		case WM_SIZE:
+		{
+			float scaling_factor = Win_GetWindowScalingFactor(wnd);
+			int s18 = int(18 * scaling_factor);
+			int s10 = int(10 * scaling_factor);
+
+			RECT rect;
+			window->mMarginSize = window->mZoomScaleDem ? ((long)(s18 * (float)window->mZoomScaleNum / (float)window->mZoomScaleDem)) : s18;
+
+			GetWindowRect(window->mWndToolbar, &rect);
+			MoveWindow(window->mWndMargin, 0, 0, window->mMarginSize, window->mSplitterRect.top - (rect.bottom - rect.top), TRUE);
+			// FIXME: was *2.25, increased for line numbers up to 9999; but neither works particularly well
+			//        if DPI scaling is involved, because script code text and linenumbers aren't DPI scaled
+			int lmargin = window->GetMarginWidth();
+			SendMessage(window->mWndScript, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG(lmargin, s10));
+
+		}
 	}
 
 	return CallWindowProc ( wndproc, wnd, msg, wparam, lparam );
@@ -354,7 +374,7 @@ LRESULT CALLBACK rvDebuggerWindow::ScriptWndProc ( HWND wnd, UINT msg, WPARAM wp
 
 LRESULT CALLBACK rvDebuggerWindow::MarginWndProc ( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam )
 {
-	rvDebuggerWindow* window = (rvDebuggerWindow*) GetWindowLong ( wnd, GWL_USERDATA );
+	rvDebuggerWindow* window = (rvDebuggerWindow*) GetWindowLongPtr ( wnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -384,15 +404,57 @@ LRESULT CALLBACK rvDebuggerWindow::MarginWndProc ( HWND wnd, UINT msg, WPARAM wp
 		case WM_PAINT:
 		{
 			HDC dc;
+			float scaling_factor = Win_GetWindowScalingFactor(wnd);
+			int s2 = int(2 * scaling_factor);
+			int s4 = int(4 * scaling_factor);
+			int width,height;
 
-			int size = window->mMarginSize - 2;
+			window->ResizeImageList(width,height);
 
 			PAINTSTRUCT ps;
 			RECT rect;
 			GetClientRect ( wnd, &rect );
-			dc = BeginPaint ( wnd, &ps );
-			FillRect ( dc, &rect, GetSysColorBrush ( COLOR_3DFACE ) );
+			dc = BeginPaint( wnd, &ps );
+			FillRect( dc, &rect, GetSysColorBrush( COLOR_3DSHADOW ) );
+
+			//draw line nrs
+			int iMaxNumberOfLines = ( (rect.bottom - rect.top ) / height ) + height;
+			int iFirstVisibleLine = SendMessage( window->mWndScript, EM_GETFIRSTVISIBLELINE, 0, 0 );
+			HFONT hf = CreateFont( height, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, "Courier New" );
+			HFONT hfOld = ( HFONT ) SelectObject( dc, hf );
+			SetBkMode( dc, OPAQUE );
+			// I think it looks nicer when the  line number background is white
+			SetBkColor( dc, RGB( 255, 255, 255 ) );
+			SetTextColor( dc, RGB( 0, 0, 255 ) );
+			
+			int lnrWidth = 8;
+			GetCharWidth32( dc, '9', '9', &lnrWidth );
+			lnrWidth *= 4; // we want enough width for 4 chars ("9999"), not just one
+			lnrWidth += 2 * s4; // we want some space around the line number
+
+			RECT lnrRect = rect;
+			lnrRect.left = rect.right;
+			lnrRect.right = lnrRect.left + lnrWidth;
+			FillRect( dc, &lnrRect, WHITE_BRUSH );
+
+			for (int i = 0; i < iMaxNumberOfLines; ++i )
+			{
+				int		c;
+				POINTL	pos;
+				c = SendMessage( window->mWndScript, EM_LINEINDEX, iFirstVisibleLine + i , 0 );
+				SendMessage( window->mWndScript, EM_POSFROMCHAR, ( WPARAM ) &pos, c );
+
+				RECT t = lnrRect;
+				t.top = pos.y;
+				t.bottom = t.top + height;
+				t.right -= s4; // a little space between text and "border" to code part of window
+				
+				idStr lntxt( iFirstVisibleLine + i + 1);
+				DrawText( dc, lntxt, lntxt.Length(), &t, DT_RIGHT );
+			}
+			DeleteObject( hf );
 
+			//draw breakpoints
 			if ( window->mScripts.Num ( ) )
 			{
 				for ( int i = 0; i < window->mClient->GetBreakpointCount(); i ++ )
@@ -407,7 +469,8 @@ LRESULT CALLBACK rvDebuggerWindow::MarginWndProc ( HWND wnd, UINT msg, WPARAM wp
 
 						c = SendMessage ( window->mWndScript, EM_LINEINDEX, bp->GetLineNumber ( ) - 1, 0 );
 						SendMessage ( window->mWndScript, EM_POSFROMCHAR, (WPARAM)&pos, c );
-						ImageList_DrawEx ( window->mImageList, 2, dc, rect.left, pos.y, size, size, CLR_NONE, CLR_NONE, ILD_NORMAL );
+						ImageList_DrawEx ( window->mTmpImageList, 2, dc, rect.left, pos.y, width, height, CLR_NONE, CLR_NONE, ILD_NORMAL );
+
 					}
 				}
 
@@ -421,7 +484,7 @@ LRESULT CALLBACK rvDebuggerWindow::MarginWndProc ( HWND wnd, UINT msg, WPARAM wp
 
 						c = SendMessage ( window->mWndScript, EM_LINEINDEX, window->mClient->GetBreakLineNumber() - 1, 0 );
 						SendMessage ( window->mWndScript, EM_POSFROMCHAR, (WPARAM)&pos, c );
-						ImageList_DrawEx ( window->mImageList, 3, dc, rect.left, pos.y, size, size, CLR_NONE, CLR_NONE, ILD_NORMAL );
+						ImageList_DrawEx ( window->mTmpImageList, 3, dc, rect.left, pos.y, width, height, CLR_NONE, CLR_NONE, ILD_NORMAL );
 					}
 				}
 
@@ -434,17 +497,19 @@ LRESULT CALLBACK rvDebuggerWindow::MarginWndProc ( HWND wnd, UINT msg, WPARAM wp
 
 						c = SendMessage ( window->mWndScript, EM_LINEINDEX, window->mClient->GetCallstack()[window->mCurrentStackDepth]->mLineNumber - 1, 0 );
 						SendMessage ( window->mWndScript, EM_POSFROMCHAR, (WPARAM)&pos, c );
-						ImageList_DrawEx ( window->mImageList, 1, dc, rect.left, pos.y, size, size, CLR_NONE, CLR_NONE, ILD_NORMAL );
+						ImageList_DrawEx ( window->mTmpImageList, 1, dc, rect.left, pos.y, width, height, CLR_NONE, CLR_NONE, ILD_NORMAL );
 					}
 				}
 			}
+			RECT tmp = rect;
 
-			rect.right-=2;
-			rect.left = rect.right + 1;
-			HPEN pen = CreatePen ( PS_SOLID, 1, GetSysColor ( COLOR_3DSHADOW ) );
+			rect.right -= s2;
+			rect.left = rect.right + s2;
+			HPEN pen = CreatePen ( PS_SOLID, s2, GetSysColor ( COLOR_BACKGROUND ) );
 			HPEN old = (HPEN)SelectObject ( dc, pen );
 			MoveToEx ( dc, rect.right, rect.top, NULL );
 			LineTo ( dc, rect.right, rect.bottom );
+			
 			SelectObject ( dc, old );
 			DeleteObject ( pen );
 			EndPaint ( wnd, &ps );
@@ -466,7 +531,7 @@ void rvDebuggerWindow::UpdateTitle ( void )
 {
 	idStr title;
 
-	title = "Quake 4 Script Debugger - ";
+	title = "Dhewm3 Script Debugger - ";
 
 	if ( mClient->IsConnected ( ) )
 	{
@@ -487,7 +552,10 @@ void rvDebuggerWindow::UpdateTitle ( void )
 	if ( mScripts.Num ( ) )
 	{
 		title += " - [";
-		title += idStr( mScripts[mActiveScript]->GetFilename() ).StripPath ( );
+		if (mActiveScript != -1)
+			title += idStr( mScripts[mActiveScript]->GetFilename() ).StripPath ( );
+		else
+			title += "Load Error";
 		title += "]";
 	}
 
@@ -590,6 +658,56 @@ void rvDebuggerWindow::UpdateCallstack ( void )
 	}
 }
 
+void rvDebuggerWindow::UpdateScriptList(void)
+{
+	LVITEM item;
+	ListView_DeleteAllItems(mWndScriptList);
+	ZeroMemory(&item, sizeof(item));
+	item.mask = LVIF_TEXT | LVIF_IMAGE;
+
+	idStrList& scripts = mClient->GetServerScripts();
+	for (int i = 0; i < scripts.Num(); i++)
+	{
+		item.iItem = ListView_GetItemCount(mWndScriptList);
+		item.pszText = "";
+		//find in activeScripts
+		item.iImage = 0;
+		for (int j = 0; j < mScripts.Num(); j++)
+		{
+			if (!idStr::Icmp(mScripts[j]->GetFilename(), scripts[i]))
+			{
+				item.iImage = 1;
+				break;
+			}
+		}
+		ListView_InsertItem(mWndScriptList, &item);
+		ListView_SetItemText(mWndScriptList, item.iItem, 1, (LPSTR)scripts[i].c_str());
+	}
+}
+
+
+void rvDebuggerWindow::UpdateBreakpointList( void )
+{
+	LVITEM item;
+	ListView_DeleteAllItems( mWndBreakList );
+	ZeroMemory( &item, sizeof( item ) );
+	item.mask = LVIF_TEXT | LVIF_IMAGE;
+
+	int numBreakPoints = mClient->GetBreakpointCount();
+	for ( int i = 0; i < numBreakPoints; i++ )
+	{
+		rvDebuggerBreakpoint* bp = mClient->GetBreakpoint( i );
+		item.iItem = ListView_GetItemCount( mWndBreakList );
+		item.pszText = "";
+		item.iImage = 2; // breakpoint
+		ListView_InsertItem( mWndBreakList, &item );
+		
+		idStr lineStr( bp->GetLineNumber() );
+		ListView_SetItemText( mWndBreakList, item.iItem, 1, (LPSTR)bp->GetFilename() );
+		ListView_SetItemText( mWndBreakList, item.iItem, 2, (LPSTR)lineStr.c_str() );
+	}
+}
+
 /*
 ================
 rvDebuggerWindow::UpdateWatch
@@ -712,7 +830,7 @@ int rvDebuggerWindow::HandleInitMenu ( WPARAM wParam, LPARAM lParam )
 			case ID_DBG_DEBUG_STEPOVER:
 			case ID_DBG_DEBUG_STEPINTO:
 			case ID_DBG_DEBUG_SHOWNEXTSTATEMENT:
-//			case ID_DBG_DEBUG_QUICKWATCH:
+			case ID_DBG_DEBUG_QUICKWATCH:
 				if ( !mClient->IsConnected() || !mClient->IsStopped() )
 				{
 					EnableMenuItem ( hmenu, nPos, MF_GRAYED|MF_BYPOSITION );
@@ -737,6 +855,46 @@ int rvDebuggerWindow::HandleInitMenu ( WPARAM wParam, LPARAM lParam )
 	return 0;
 }
 
+
+void rvDebuggerWindow::ResizeImageList(int& widthOut, int& heightOut)
+{
+	//mTmpImageList
+	float scaling_factor = Win_GetWindowScalingFactor(mWnd);
+	int s16 = int(16 * scaling_factor);
+
+	TEXTMETRIC	tm;
+	HDC			dc;
+	dc = GetDC(mWndScript);
+
+	GetTextMetrics(dc, &tm);
+	int height = mZoomScaleDem ? (tm.tmHeight * (float)mZoomScaleNum / (float)mZoomScaleDem)  : tm.tmHeight ;
+	height *= scaling_factor;
+	int width = mZoomScaleDem ? ( tm.tmMaxCharWidth * (float)mZoomScaleNum / (float)mZoomScaleDem) : tm.tmMaxCharWidth;
+	width *= scaling_factor;
+
+	ImageList_Destroy(mTmpImageList);
+	mTmpImageList = ImageList_Create(width, height, ILC_COLOR | ILC_MASK , 0, 2);
+	ImageList_AddIcon(mTmpImageList, (HICON)LoadImage(mInstance, MAKEINTRESOURCE(IDI_DBG_EMPTY), IMAGE_ICON, width, height, LR_DEFAULTSIZE | LR_DEFAULTCOLOR));
+	ImageList_AddIcon(mTmpImageList, (HICON)LoadImage(mInstance, MAKEINTRESOURCE(IDI_DBG_CURRENT), IMAGE_ICON, width, height, LR_DEFAULTSIZE | LR_DEFAULTCOLOR));
+	ImageList_AddIcon(mTmpImageList, (HICON)LoadImage(mInstance, MAKEINTRESOURCE(IDI_DBG_BREAKPOINT), IMAGE_ICON, width, height, LR_DEFAULTSIZE | LR_DEFAULTCOLOR));
+	ImageList_AddIcon(mTmpImageList, (HICON)LoadImage(mInstance, MAKEINTRESOURCE(IDI_DBG_CURRENTLINE), IMAGE_ICON, width, height, LR_DEFAULTSIZE | LR_DEFAULTCOLOR));
+
+	widthOut = width;
+	heightOut = height;
+}
+
+float rvDebuggerWindow::GetMarginWidth ( void )
+{
+	TEXTMETRIC	tm;
+	HDC			dc;
+
+	dc = GetDC( mWndScript );
+	GetTextMetrics( dc, &tm );
+
+	float scaling_factor = Win_GetWindowScalingFactor( mWndScript );
+
+	return scaling_factor * (4 * tm.tmMaxCharWidth + tm.tmMaxCharWidth);
+}
 /*
 ================
 rvDebuggerWindow::HandleCreate
@@ -760,37 +918,44 @@ int rvDebuggerWindow::HandleCreate ( WPARAM wparam, LPARAM lparam )
 	// Create the script window
 	LoadLibrary ( "Riched20.dll" );
 	mWndScript = CreateWindow ( "RichEdit20A", "", WS_CHILD|WS_BORDER|ES_NOHIDESEL|ES_READONLY|ES_MULTILINE|ES_WANTRETURN|ES_AUTOVSCROLL|ES_AUTOHSCROLL|WS_VSCROLL|WS_HSCROLL, 0, 0, 100, 100, mWnd, (HMENU) IDC_DBG_SCRIPT, mInstance, 0 );
-	SendMessage ( mWndScript, EM_SETEVENTMASK, 0, ENM_SCROLL|ENM_CHANGE  );
+	SendMessage ( mWndScript, EM_SETEVENTMASK, 0, ENM_SCROLL | ENM_CHANGE | ENM_UPDATE | ENM_SCROLLEVENTS | ENM_REQUESTRESIZE) ;
 	SendMessage ( mWndScript, EM_SETWORDBREAKPROC, 0, (LPARAM) ScriptWordBreakProc );
-	mOldScriptProc = (WNDPROC)GetWindowLong ( mWndScript, GWL_WNDPROC );
-	SetWindowLong ( mWndScript, GWL_USERDATA, (LONG)this );
-	SetWindowLong ( mWndScript, GWL_WNDPROC, (LONG)ScriptWndProc );
+	mOldScriptProc = (WNDPROC)GetWindowLongPtr ( mWndScript, GWLP_WNDPROC );
+	SetWindowLongPtr ( mWndScript, GWLP_USERDATA, (LONG_PTR)this );
+	SetWindowLongPtr ( mWndScript, GWLP_WNDPROC, (LONG_PTR)ScriptWndProc );
 
 	SendMessage ( mWndScript, EM_SETTABSTOPS, 1, (LPARAM)&tabsize );
 
 	dc = GetDC ( mWndScript );
 	GetTextMetrics ( dc, &tm );
 	ZeroMemory ( &lf, sizeof(lf) );
-	lf.lfHeight = tm.tmHeight;
+	lf.lfHeight = tm.tmHeight * Win_GetWindowScalingFactor( mWndScript );
 	strcpy ( lf.lfFaceName, "Courier New" );
 
 	SendMessage ( mWndScript, WM_SETFONT, (WPARAM)CreateFontIndirect ( &lf ), 0 );
-	SendMessage ( mWndScript, EM_SETMARGINS, EC_LEFTMARGIN|EC_RIGHTMARGIN, MAKELONG(18,10) );
+	SendMessage ( mWndScript, EM_SETMARGINS, EC_LEFTMARGIN|EC_RIGHTMARGIN, MAKELONG( GetMarginWidth(),10) );
 	SendMessage ( mWndScript, EM_SETBKGNDCOLOR, 0, GetSysColor ( COLOR_3DFACE ) );
 
 	mWndOutput = CreateWindow ( "RichEdit20A", "", WS_CHILD|ES_READONLY|ES_MULTILINE|ES_WANTRETURN|ES_AUTOVSCROLL|ES_AUTOHSCROLL|WS_VSCROLL|WS_HSCROLL|WS_VISIBLE, 0, 0, 100, 100, mWnd, (HMENU) IDC_DBG_OUTPUT, mInstance, 0 );
 	SendMessage ( mWndOutput, WM_SETFONT, (WPARAM)CreateFontIndirect ( &lf ), 0 );
 	SendMessage ( mWndOutput, EM_SETMARGINS, EC_LEFTMARGIN|EC_RIGHTMARGIN, MAKELONG(18,10) );
 	SendMessage ( mWndOutput, EM_SETBKGNDCOLOR, 0, GetSysColor ( COLOR_3DFACE ) );
+	SendMessage ( mWndOutput, EM_SETEVENTMASK, 0, ENM_SCROLL | ENM_CHANGE | ENM_UPDATE | ENM_SCROLLEVENTS);
 
 	mWndConsole = CreateWindow ( "RichEdit20A", "", WS_CHILD|ES_READONLY|ES_MULTILINE|ES_WANTRETURN|ES_AUTOVSCROLL|ES_AUTOHSCROLL|WS_VSCROLL|WS_HSCROLL, 0, 0, 100, 100, mWnd, (HMENU) IDC_DBG_CONSOLE, mInstance, 0 );
 	SendMessage ( mWndConsole, WM_SETFONT, (WPARAM)CreateFontIndirect ( &lf ), 0 );
 	SendMessage ( mWndConsole, EM_SETMARGINS, EC_LEFTMARGIN|EC_RIGHTMARGIN, MAKELONG(18,10) );
 	SendMessage ( mWndConsole, EM_SETBKGNDCOLOR, 0, GetSysColor ( COLOR_3DFACE ) );
 
+	mWndConsoleInput = CreateWindow( "RichEdit20A", "", WS_CHILD | ES_WANTRETURN | ES_AUTOVSCROLL |  WS_VSCROLL | WS_BORDER, 0, 0, 100, 18, mWnd, ( HMENU ) IDC_DBG_CONSOLEINPUT, mInstance, 0 );
+	lf.lfHeight = -MulDiv( 8, GetDeviceCaps( dc, LOGPIXELSY ), 72 );
+	strcpy( lf.lfFaceName, "Arial" );
+	SendMessage( mWndConsoleInput, WM_SETFONT, ( WPARAM ) CreateFontIndirect( &lf ), 0 );
+	SendMessage( mWndConsoleInput, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG( 18, 10 ) );
+
 	mWndMargin = CreateWindow ( "STATIC", "", WS_VISIBLE|WS_CHILD, 0, 0, 0, 0, mWndScript, (HMENU)IDC_DBG_SPLITTER, mInstance, NULL );
-	SetWindowLong ( mWndMargin, GWL_USERDATA, (LONG)this );
-	SetWindowLong ( mWndMargin, GWL_WNDPROC, (LONG)MarginWndProc );
+	SetWindowLongPtr ( mWndMargin, GWLP_USERDATA, (LONG_PTR)this );
+	SetWindowLongPtr ( mWndMargin, GWLP_WNDPROC, (LONG_PTR)MarginWndProc );
 
 	mWndBorder = CreateWindow ( "STATIC", "", WS_VISIBLE|WS_CHILD|SS_GRAYFRAME, 0, 0, 0, 0, mWnd, (HMENU)IDC_DBG_BORDER, mInstance, NULL );
 
@@ -816,23 +981,52 @@ int rvDebuggerWindow::HandleCreate ( WPARAM wparam, LPARAM lparam )
 	TabCtrl_InsertItem ( mWndTabs, 3, &item );
 	item.pszText = "Threads";
 	TabCtrl_InsertItem ( mWndTabs, 4, &item );
+	item.pszText = "Scripts";
+	TabCtrl_InsertItem ( mWndTabs, 5, &item );
+	item.pszText = "Breakpoints";
+	TabCtrl_InsertItem ( mWndTabs, 6, &item );
 
 	mWndCallstack = CreateWindow ( WC_LISTVIEW, "", LVS_REPORT|WS_CHILD|LVS_SHAREIMAGELISTS, 0, 0, 0, 0, mWnd, (HMENU)IDC_DBG_CALLSTACK, mInstance, NULL );
 	mWndWatch     = CreateWindow ( WC_LISTVIEW, "", LVS_REPORT|WS_CHILD|LVS_EDITLABELS|LVS_OWNERDRAWFIXED, 0, 0, 0, 0, mWnd, (HMENU)IDC_DBG_WATCH, mInstance, NULL );
 	mWndThreads   = CreateWindow ( WC_LISTVIEW, "", LVS_REPORT|WS_CHILD|LVS_SHAREIMAGELISTS, 0, 0, 0, 0, mWnd, (HMENU)IDC_DBG_THREADS, mInstance, NULL );
+	mWndScriptList = CreateWindow( WC_LISTVIEW, "", LVS_REPORT|WS_CHILD|LVS_SHAREIMAGELISTS, 0, 0, 0, 0, mWnd, (HMENU)IDC_DBG_SCRIPTLIST, mInstance, NULL );
+	mWndBreakList =  CreateWindow( WC_LISTVIEW, "", LVS_REPORT|WS_CHILD|LVS_SHAREIMAGELISTS, 0, 0, 0, 0, mWnd, (HMENU)IDC_DBG_BREAKLIST, mInstance, NULL );
 
 	LVCOLUMN col;
 	col.mask = LVCF_WIDTH|LVCF_TEXT;
+
+	col.cx = 20;
+	col.pszText = "";
+	ListView_InsertColumn( mWndBreakList, 0, &col );
+#if 0 // TODO: figure out how to get the function name in UpdateBreakpointList()
+	col.cx = 150;
+	col.pszText = "Function";
+	ListView_InsertColumn( mWndBreakList, 1, &col );
+#endif
+	col.cx = 350;
+	col.pszText = "Filename";
+	ListView_InsertColumn( mWndBreakList, 1, &col );
+	col.cx = 50;
+	col.pszText = "Line";
+	ListView_InsertColumn( mWndBreakList, 2, &col );
+
+	col.cx = 20;
+	col.pszText = "";
+	ListView_InsertColumn ( mWndScriptList, 0, &col);
+	col.cx = 350;
+	col.pszText = "Filename";
+	ListView_InsertColumn ( mWndScriptList, 1, &col );
+
 	col.cx = 20;
 	col.pszText = "";
 	ListView_InsertColumn ( mWndCallstack, 0, &col );
 	col.cx = 150;
 	col.pszText = "Function";
 	ListView_InsertColumn ( mWndCallstack, 1, &col );
-	col.cx = 150;
+	col.cx = 50;
 	col.pszText = "Line";
 	ListView_InsertColumn ( mWndCallstack, 2, &col );
-	col.cx = 150;
+	col.cx = 350;
 	col.pszText = "Filename";
 	ListView_InsertColumn ( mWndCallstack, 3, &col );
 
@@ -863,13 +1057,21 @@ int rvDebuggerWindow::HandleCreate ( WPARAM wparam, LPARAM lparam )
 	ImageList_AddIcon ( mImageList, (HICON)LoadImage ( mInstance, MAKEINTRESOURCE(IDI_DBG_CURRENT), IMAGE_ICON, 16, 16, LR_DEFAULTSIZE|LR_DEFAULTCOLOR) );
 	ImageList_AddIcon ( mImageList, (HICON)LoadImage ( mInstance, MAKEINTRESOURCE(IDI_DBG_BREAKPOINT), IMAGE_ICON, 16, 16, LR_DEFAULTSIZE|LR_DEFAULTCOLOR) );
 	ImageList_AddIcon ( mImageList, (HICON)LoadImage ( mInstance, MAKEINTRESOURCE(IDI_DBG_CURRENTLINE), IMAGE_ICON, 16, 16, LR_DEFAULTSIZE|LR_DEFAULTCOLOR) );
+	
+	int w, h;
+	ResizeImageList(w, h);
+	ListView_SetImageList ( mWndScriptList, mTmpImageList, LVSIL_SMALL );
 	ListView_SetImageList ( mWndThreads, mImageList, LVSIL_SMALL );
 	ListView_SetImageList ( mWndCallstack, mImageList, LVSIL_SMALL );
+	ListView_SetImageList ( mWndBreakList, mImageList, LVSIL_SMALL );
 
 	EnableWindows ( FALSE );
-
+	EnableWindow ( mWndScriptList, true );
+	
 	ListView_SetExtendedListViewStyle ( mWndCallstack, LVS_EX_FULLROWSELECT );
 	ListView_SetExtendedListViewStyle ( mWndThreads, LVS_EX_FULLROWSELECT );
+	ListView_SetExtendedListViewStyle ( mWndScriptList, LVS_EX_FULLROWSELECT );
+	ListView_SetExtendedListViewStyle ( mWndBreakList, LVS_EX_FULLROWSELECT );
 
 	gDebuggerApp.GetOptions().GetColumnWidths ( "cw_callstack", mWndCallstack );
 	gDebuggerApp.GetOptions().GetColumnWidths ( "cw_threads", mWndThreads );
@@ -920,6 +1122,10 @@ int rvDebuggerWindow::HandleCreate ( WPARAM wparam, LPARAM lparam )
 		AddWatch ( s );
 	}
 
+	RECT t;
+	GetClientRect(mWndScript, &t);
+	SendMessage(mWndScript, WM_SIZE, 0, MAKELPARAM(t.right - t.left, t.bottom - t.top));
+
 	return 0;
 }
 
@@ -957,6 +1163,36 @@ int rvDebuggerWindow::HandleCommand ( WPARAM wparam, LPARAM lparam )
 
 	switch ( id )
 	{
+		case ID_DBG_SEND_COMMAND:
+		{
+			if ( mClient->IsConnected( ) && GetFocus( ) == mWndConsoleInput ) {
+				GETTEXTLENGTHEX textLen;
+				int				chars;
+				textLen.flags = GTL_DEFAULT | GTL_USECRLF;
+				textLen.codepage = CP_ACP;
+				chars = SendMessage( mWndConsoleInput, EM_GETTEXTLENGTHEX, ( WPARAM ) &textLen, 0 );
+
+				char *text = new char[chars + 1];
+
+				GETTEXTEX getText;
+				getText.cb = chars + 1;
+				getText.codepage = CP_ACP;
+				getText.flags = GT_DEFAULT | GT_USECRLF;
+				getText.lpDefaultChar = NULL;
+				getText.lpUsedDefChar = NULL;
+				SendMessage( mWndConsoleInput, EM_GETTEXTEX, ( WPARAM ) &getText, ( LPARAM ) text );
+				idStr parse = text;
+				delete[] text;
+
+				mClient->SendCommand( parse.c_str() );
+
+				SendMessage( mWndConsoleInput, EM_SETSEL, 0, -1 );
+				SendMessage( mWndConsoleInput, EM_REPLACESEL, FALSE, ( LPARAM ) "" );
+				UpdateWindow( mWndConsoleInput );
+			}
+			break;
+		}
+
 		case ID_DBG_EDIT_FINDSELECTED:
 		{
 			idStr text;
@@ -1025,7 +1261,7 @@ int rvDebuggerWindow::HandleCommand ( WPARAM wparam, LPARAM lparam )
 				GetCurrentDirectory ( MAX_PATH, curDir );
 
 				GetModuleFileName ( NULL, exeFile, MAX_PATH );
-				const char* s = va("%s +set fs_game %s +set fs_cdpath %s", exeFile, cvarSystem->GetCVarString( "fs_game" ), cvarSystem->GetCVarString( "fs_cdpath" ) );
+				const char* s = va("%s +set fs_game %s +set fs_cdpath %s +set com_enableDebuggerServer 1", exeFile, cvarSystem->GetCVarString( "fs_game" ), cvarSystem->GetCVarString( "fs_cdpath" ) );
 				CreateProcess ( NULL, (LPSTR)s,
 				NULL, NULL, FALSE, 0, NULL, curDir, &startup, &process );
 
@@ -1049,13 +1285,13 @@ int rvDebuggerWindow::HandleCommand ( WPARAM wparam, LPARAM lparam )
 			LONG	num;
 			LONG	dem;
 
-			SendMessage ( mWndScript, EM_GETZOOM, (LONG)&num, (LONG)&dem );
+			SendMessage ( mWndScript, EM_GETZOOM, (WPARAM)&num, (LPARAM)&dem );
 			if ( num != mZoomScaleNum || dem != mZoomScaleDem )
 			{
 				mZoomScaleNum = num;
 				mZoomScaleDem = dem;
 				GetClientRect ( mWndScript, &t );
-				SendMessage ( mWnd, WM_SIZE, 0, MAKELPARAM(t.right-t.left,t.bottom-t.top) );
+				SendMessage ( mWndScript, WM_SIZE, 0, MAKELPARAM(t.right-t.left ,t.bottom-t.top) );
 			}
 			else
 			{
@@ -1064,6 +1300,7 @@ int rvDebuggerWindow::HandleCommand ( WPARAM wparam, LPARAM lparam )
 			break;
 		}
 
+		case 111: // DG: Debugger.rc has 'MENUITEM "Toggle &Breakpoint\tF9", 111' for the context menu no idea why 111 but this works
 		case ID_DBG_DEBUG_TOGGLEBREAKPOINT:
 			ToggleBreakpoint ( );
 			break;
@@ -1139,6 +1376,23 @@ int rvDebuggerWindow::HandleCommand ( WPARAM wparam, LPARAM lparam )
 			UpdateScript ( );
 			break;
 		}
+
+		// DG: support "Run To Cursor" from context menu
+		case ID_DBG_DEBUG_RUNTOCURSOR:
+		{
+			// Find the currently selected line
+			DWORD sel;
+			SendMessage( mWndScript, EM_GETSEL, (WPARAM)&sel, 0 );
+			int lineNumber = SendMessage( mWndScript, EM_LINEFROMCHAR, sel, 0 ) + 1;
+
+			const char* filename = mScripts[mActiveScript]->GetFilename();
+			mClient->AddBreakpoint( filename, lineNumber, true );
+			mClient->Resume();
+			break;
+		}
+
+		// TODO: case ID_DBG_DEBUG_SHOWNEXTSTATEMENT:
+		//       whatever this is supposed to do (also from context menu)
 	}
 
 	return 0;
@@ -1153,7 +1407,7 @@ Window procedure for the deubgger window
 */
 LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam )
 {
-	rvDebuggerWindow* window = (rvDebuggerWindow*) GetWindowLong ( wnd, GWL_USERDATA );
+	rvDebuggerWindow* window = (rvDebuggerWindow*) GetWindowLongPtr ( wnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -1176,7 +1430,7 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 			gDebuggerApp.GetOptions().SetString ( va("watch%d", i ), "" );
 
 			window->mWnd = NULL;
-			SetWindowLong ( wnd, GWL_USERDATA, 0 );
+			SetWindowLongPtr ( wnd, GWLP_USERDATA, 0 );
 			break;
 		}
 
@@ -1202,8 +1456,13 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 
 		case WM_SIZE:
 		{
+			float scaling_factor = Win_GetWindowScalingFactor(wnd);
+			int s18 = int(18 * scaling_factor);
+			int s4 = int(4 * scaling_factor);
+			int s10 = int(10 * scaling_factor);
+
 			RECT rect;
-			window->mMarginSize = window->mZoomScaleDem ? ((long)(18.0f * (float)window->mZoomScaleNum / (float)window->mZoomScaleDem)):18;
+			window->mMarginSize = window->mZoomScaleDem ? ((long)(s18 * (float)window->mZoomScaleNum / (float)window->mZoomScaleDem)): s18;
 			window->mSplitterRect.left = 0;
 			window->mSplitterRect.right = LOWORD(lparam);
 
@@ -1215,14 +1474,27 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 			SetRect ( &rect, 0, window->mSplitterRect.bottom, LOWORD(lparam), HIWORD(lparam) );
 			MoveWindow ( window->mWndTabs, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
 			SendMessage ( window->mWndTabs, TCM_ADJUSTRECT, FALSE, (LPARAM)&rect );
-			rect.bottom -= 4 ;
+			rect.bottom -= s4;
 			MoveWindow ( window->mWndBorder, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
 			InflateRect ( &rect, -1, -1 );
 			MoveWindow ( window->mWndOutput, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
-			MoveWindow ( window->mWndConsole, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
+			MoveWindow ( window->mWndConsole, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top - s18, TRUE );
+			MoveWindow ( window->mWndConsoleInput, rect.left, rect.bottom-s18, rect.right - rect.left, s18, TRUE );
 			MoveWindow ( window->mWndCallstack, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
 			MoveWindow ( window->mWndWatch, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
 			MoveWindow ( window->mWndThreads, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
+			MoveWindow ( window->mWndScriptList, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
+			MoveWindow ( window->mWndBreakList, rect.left, rect.top, rect.right-rect.left, rect.bottom-rect.top, TRUE );
+
+			// FIXME: was *2.25, increased for line numbers up to 9999; but neither works particularly well
+			//        if DPI scaling is involved, because script code text and linenumbers aren't DPI scaled
+			int lmargin = window->GetMarginWidth();
+			SendMessage(window->mWndScript, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG(lmargin, s10));
+			SendMessage(window->mWndCallstack, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG(s18, s10));
+			SendMessage(window->mWndOutput, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG(s18, s10));
+			SendMessage(window->mWndConsole, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG(s18, s10));
+			SendMessage( window->mWndConsoleInput, EM_SETMARGINS, EC_LEFTMARGIN | EC_RIGHTMARGIN, MAKELONG( s18, s10 ) );
+
 			break;
 		}
 
@@ -1258,7 +1530,17 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 			}
 			break;
 		}
+		case WM_MOUSEWHEEL:
+		{
+			HDC dc = GetDC(wnd);
+			DrawFocusRect(dc, &window->mSplitterRect);
+			ReleaseDC(wnd, dc);
+
+			RECT client;
+			GetClientRect(wnd, &client);
+			SendMessage(wnd, WM_SIZE, 0, MAKELPARAM(client.right - client.left, client.bottom - client.top));
 
+		}
 		case WM_LBUTTONUP:
 			if ( window->mSplitterDrag )
 			{
@@ -1320,7 +1602,7 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 		{
 			CREATESTRUCT* cs = (CREATESTRUCT*) lparam;
 			window = (rvDebuggerWindow*) cs->lpCreateParams;
-			SetWindowLong ( wnd, GWL_USERDATA, (LONG)cs->lpCreateParams );
+			SetWindowLongPtr ( wnd, GWLP_USERDATA, (LONG_PTR)cs->lpCreateParams );
 
 			window->mWnd = wnd;
 			window->HandleCreate ( wparam, lparam );
@@ -1439,7 +1721,6 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 						}
 					}
 					break;
-
 				case IDC_DBG_CALLSTACK:
 					if ( hdr->code == NM_DBLCLK )
 					{
@@ -1454,14 +1735,74 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 						}
 					}
 					break;
+				case IDC_DBG_SCRIPTLIST:
+					if ( hdr->code == NM_DBLCLK )
+					{
+						int sel = ListView_GetNextItem(hdr->hwndFrom, -1, LVNI_SELECTED);
+
+						if (sel != -1)
+						{
+							LVITEM item = { 0 }; 
+							char   temp[1024] = { 0 };
+							item.mask = LVIF_TEXT;
+							item.pszText = temp;
+							item.cchTextMax = sizeof(temp) - 1;
+							item.iSubItem = 1;
+							item.iItem = sel;
+
+							ListView_GetItem(hdr->hwndFrom, &item);
+
+							if (strlen(item.pszText) > 0)
+							{
+								window->OpenScript(item.pszText);
+								window->UpdateScriptList();
+							}
+						}
+					}
+					break;
+
+				case IDC_DBG_BREAKLIST:
+					if ( hdr->code == NM_DBLCLK || hdr->code == NM_CLICK ) {
+						LPNMITEMACTIVATE ia = (LPNMITEMACTIVATE)lparam;
+						int sel = ia->iItem;
+						if ( sel != -1 ) {
+							rvDebuggerBreakpoint* bp = window->mClient->GetBreakpoint( sel );
+							if ( bp != NULL ) {
+								if ( hdr->code == NM_DBLCLK ) {
+									// double clicked breakpoint => show it in its file
+									window->OpenScript( bp->GetFilename(), bp->GetLineNumber() - 1 );
+								} else if( ia->iSubItem == 0 ) {
+									// clicked breakpoint symbol => delete breakpoint
+									window->mClient->RemoveBreakpoint( bp->GetID() );
+									window->UpdateBreakpointList();
+								}
+							}
+						}
+					} else if ( hdr->code == LVN_KEYDOWN ) {
+						// when user selects a breakpoints and presses the Del key, remove the breakpoint
+						int sel = ListView_GetNextItem( hdr->hwndFrom, -1, LVNI_SELECTED );
+						if ( sel != -1 ) {
+							LPNMLVKEYDOWN kd = (LPNMLVKEYDOWN)lparam;
+							rvDebuggerBreakpoint* bp = window->mClient->GetBreakpoint( sel );
+							if ( kd->wVKey == VK_DELETE && bp != NULL ) {
+								window->mClient->RemoveBreakpoint( bp->GetID() );
+								window->UpdateBreakpointList();
+							}
+						}
+					}
+					break;
+
 				case IDC_DBG_TABS:
 					if ( hdr->code == TCN_SELCHANGE )
 					{
 						ShowWindow ( window->mWndOutput, SW_HIDE );
 						ShowWindow ( window->mWndConsole, SW_HIDE );
+						ShowWindow ( window->mWndConsoleInput, SW_HIDE );
 						ShowWindow ( window->mWndCallstack, SW_HIDE );
 						ShowWindow ( window->mWndWatch, SW_HIDE );
 						ShowWindow ( window->mWndThreads, SW_HIDE );
+						ShowWindow ( window->mWndScriptList, SW_HIDE );
+						ShowWindow ( window->mWndBreakList, SW_HIDE );
 						switch ( TabCtrl_GetCurSel ( hdr->hwndFrom ) )
 						{
 							case 0:
@@ -1470,6 +1811,7 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 
 							case 1:
 								ShowWindow ( window->mWndConsole, SW_SHOW );
+								ShowWindow( window->mWndConsoleInput, SW_SHOW );
 								break;
 
 							case 2:
@@ -1483,6 +1825,14 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 							case 4:
 								ShowWindow ( window->mWndThreads, SW_SHOW );
 								break;
+
+							case 5:
+								ShowWindow(window->mWndScriptList, SW_SHOW);
+								break;
+
+							case 6:
+								ShowWindow( window->mWndBreakList, SW_SHOW );
+								break;
 						}
 					}
 					break;
@@ -1493,7 +1843,7 @@ LRESULT CALLBACK rvDebuggerWindow::WndProc ( HWND wnd, UINT msg, WPARAM wparam,
 		case WM_CLOSE:
 			if ( window->mClient->IsConnected ( ) )
 			{
-				if ( IDNO == MessageBox ( wnd, "The debugger is currently connected to a running version of the game.  Are you sure you want to close now?", "Quake 4 Script Debugger", MB_YESNO|MB_ICONQUESTION ) )
+				if ( IDNO == MessageBox ( wnd, "The debugger is currently connected to a running version of the game.  Are you sure you want to close now?", "Dhewm3 Script Debugger", MB_YESNO|MB_ICONQUESTION ) )
 				{
 					return 0;
 				}
@@ -1535,14 +1885,20 @@ rvDebuggerWindow::ProcessNetMessage
 Process an incoming network message
 ================
 */
-void rvDebuggerWindow::ProcessNetMessage ( msg_t* msg )
+void rvDebuggerWindow::ProcessNetMessage ( idBitMsg* msg )
 {
-	unsigned short command;
+	short command;
 
-	command = (unsigned short)MSG_ReadShort ( msg );
+	command = msg->ReadShort( );
 
 	switch ( command )
 	{
+		case DBMSG_REMOVEBREAKPOINT:
+			MessageBeep(MB_ICONEXCLAMATION);
+			InvalidateRect(mWndScript, NULL, FALSE);
+			UpdateBreakpointList();
+			break;
+
 		case DBMSG_RESUMED:
 			UpdateTitle ( );
 			UpdateToolbar ( );
@@ -1554,9 +1910,9 @@ void rvDebuggerWindow::ProcessNetMessage ( msg_t* msg )
 			char temp2[1024];
 			int	 i;
 
-			MSG_ReadShort ( msg );
-			MSG_ReadString ( msg, temp, 1024 );
-			MSG_ReadString ( msg, temp2, 1024 );
+			msg->ReadShort ( );
+			msg->ReadString (  temp, 1024 );
+			msg->ReadString (  temp2, 1024 );
 			if ( mTooltipVar.Icmp ( temp ) == 0 )
 			{
 				mTooltipValue = temp2;
@@ -1624,10 +1980,17 @@ void rvDebuggerWindow::ProcessNetMessage ( msg_t* msg )
 			break;
 
 		case DBMSG_PRINT:
+		{
+			HWND prevFocus = GetFocus();
+			SetFocus ( mWndConsole );
 			SendMessage ( mWndConsole, EM_SETSEL, -1, -1 );
-			SendMessage ( mWndConsole, EM_REPLACESEL, 0, (LPARAM)(const char*)(msg->data) + msg->readcount );
+			SendMessage ( mWndConsole, EM_REPLACESEL, 0, (LPARAM)(const char*)(msg->GetData()) + msg->GetReadCount() );
+			SendMessage( mWndConsole, EM_SETSEL, -1, -1 );
 			SendMessage ( mWndConsole, EM_SCROLLCARET, 0, 0 );
+			UpdateWindow( mWndConsole );
+			SetFocus( prevFocus );
 			break;
+		}
 
 		case DBMSG_BREAK:
 		{
@@ -1636,6 +1999,7 @@ void rvDebuggerWindow::ProcessNetMessage ( msg_t* msg )
 			mCurrentStackDepth = 0;
 			mClient->InspectVariable ( mTooltipVar, mCurrentStackDepth );
 			UpdateWatch ( );
+			UpdateBreakpointList();
 			EnableWindows ( TRUE );
 			OpenScript ( mClient->GetBreakFilename(), mClient->GetBreakLineNumber() - 1 );
 			UpdateTitle ( );
@@ -1643,7 +2007,11 @@ void rvDebuggerWindow::ProcessNetMessage ( msg_t* msg )
 			SetForegroundWindow ( mWnd );
 			break;
 		}
-
+		case DBMSG_INSPECTSCRIPTS:
+		{
+			UpdateScriptList ( );
+			break;
+		}
 		case DBMSG_INSPECTCALLSTACK:
 		{
 			UpdateCallstack ( );
@@ -1719,7 +2087,7 @@ Opens the script with the given filename and will scroll to the given line
 number if one is specified
 ================
 */
-bool rvDebuggerWindow::OpenScript ( const char* filename, int lineNumber )
+bool rvDebuggerWindow::OpenScript ( const char* filename, int lineNumber, idProgram* program )
 {
 	int i;
 
@@ -1748,7 +2116,6 @@ bool rvDebuggerWindow::OpenScript ( const char* filename, int lineNumber )
 		// Load the script
 		if ( !script->Load ( filename ) )
 		{
-			delete script;
 			SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 			return false;
 		}
@@ -1777,7 +2144,7 @@ bool rvDebuggerWindow::OpenScript ( const char* filename, int lineNumber )
 	// Move to a specific line number?
 	if ( lineNumber != -1 )
 	{
-		int		c;
+		long	c;
 
 		// Put the caret on the line number specified and scroll it into position.
 		// This is a bit of a hack since we set the selection twice, but setting the
@@ -1785,10 +2152,10 @@ bool rvDebuggerWindow::OpenScript ( const char* filename, int lineNumber )
 		// and then scroll before going back to (c,c).
 		// NOTE: We scroll to the line before the one we want so its more visible
 		SetFocus ( mWndScript );
-		c = SendMessage ( mWndScript, EM_LINEINDEX, lineNumber - 1, 0 );
+		c = SendMessage ( mWndScript, EM_LINEINDEX, (long)lineNumber - 1, 0 );
 		SendMessage ( mWndScript, EM_SETSEL, c, c + 1 );
 		SendMessage ( mWndScript, EM_SCROLLCARET, 0, 0 );
-		c = SendMessage ( mWndScript, EM_LINEINDEX, lineNumber, 0 );
+		c = SendMessage ( mWndScript, EM_LINEINDEX, (long)lineNumber, 0 );
 		SendMessage ( mWndScript, EM_SETSEL, c, c );
 	}
 	else
@@ -1843,6 +2210,8 @@ void rvDebuggerWindow::ToggleBreakpoint ( void )
 
 	// Force a repaint of the script window
 	InvalidateRect ( mWndScript, NULL, FALSE );
+
+	UpdateBreakpointList();
 }
 
 /*
@@ -1895,6 +2264,8 @@ void rvDebuggerWindow::CreateToolbar ( void )
 	SendMessage( mWndToolbar, TB_ADDBITMAP, (WPARAM)4, (LPARAM) &tbab );
 
 	// Add the buttons to the toolbar
+	// FIXME:  warning C4838: conversion from 'int' to 'BYTE' requires a narrowing conversion
+	// most probably because TBBUTTON has 4 more bytes in bReserved for alignment on _WIN64
 	TBBUTTON tbb[] = { { 0, 0,					TBSTATE_ENABLED, BTNS_SEP,    0, 0, -1 },
 					   { 8, ID_DBG_FILE_OPEN,	TBSTATE_ENABLED, BTNS_BUTTON, 0, 0, -1 },
 					   { 0, 0,					TBSTATE_ENABLED, BTNS_SEP,    0, 0, -1 },
@@ -2013,6 +2384,7 @@ int rvDebuggerWindow::HandleActivate ( WPARAM wparam, LPARAM lparam )
 			}
 		}
 	}
+	UpdateBreakpointList();
 
 	return 1;
 }
@@ -2193,7 +2565,7 @@ then the last text used will be searched for.
 */
 bool rvDebuggerWindow::FindNext ( const char* text )
 {
-	int		 start;
+	long	 start;
 	FINDTEXT ft;
 
 	if ( text )
@@ -2230,7 +2602,7 @@ bool rvDebuggerWindow::FindNext ( const char* text )
 		}
 	}
 
-	SendMessage ( mWndScript, EM_SETSEL, start, start + mFind.Length() );
+	SendMessage ( mWndScript, EM_SETSEL, start, start + (long)mFind.Length() );
 	SendMessage ( mWndScript, EM_SCROLLCARET, 0, 0 );
 
 	return true;
@@ -2247,7 +2619,7 @@ then the last text used will be searched for.
 */
 bool rvDebuggerWindow::FindPrev ( const char* text )
 {
-	int		 start;
+	long	 start;
 	FINDTEXT ft;
 
 	if ( text )
diff --git a/neo/tools/debugger/DebuggerWindow.h b/neo/tools/debugger/DebuggerWindow.h
index d1e5747..949115a 100644
--- a/neo/tools/debugger/DebuggerWindow.h
+++ b/neo/tools/debugger/DebuggerWindow.h
@@ -52,39 +52,58 @@ public:
 	rvDebuggerWindow ( );
 	~rvDebuggerWindow ( );
 
-	bool			Create				( HINSTANCE hInstance );
+	bool							Create				( HINSTANCE hInstance );
 
-	static bool		Activate			( void );
+	static bool						Activate			( void );
 
-	void			ProcessNetMessage	( msg_t* msg );
+	void							ProcessNetMessage	( idBitMsg * msg );
 
-	void			Printf				( const char* format, ... );
+	void							Printf				( const char* format, ... );
 
-	HWND			GetWindow			( void );
+	HWND							GetWindow			( void );
 
-	void			AddWatch			( const char* name, bool update = true );
+	void							AddWatch			( const char* name, bool update = true );
 
-	HINSTANCE		GetInstance			( void );
-
-protected:
-
-	bool					FindPrev			( const char* text = NULL );
-	bool					FindNext			( const char* text = NULL );
-
-	void					UpdateWatch			( void );
-	void					UpdateWindowMenu	( void );
-	void					UpdateScript		( void );
-	void					UpdateToolbar		( void );
-	void					UpdateTitle			( void );
-	void					UpdateCallstack		( void );
-	void					UpdateRecentFiles	( void );
-	bool					OpenScript			( const char* filename, int lineNumber = -1  );
-	void					EnableWindows		( bool state );
-
-	int						GetSelectedText		( idStr& text );
-
-	void					ToggleBreakpoint	( void );
+	HINSTANCE						GetInstance			( void );
 
+private:
+	bool							RegisterClass	( void );
+	void							CreateToolbar	( void );
+	bool							InitRecentFiles	( void );
+
+	int								HandleInitMenu			( WPARAM wParam, LPARAM lParam );
+	int								HandleCommand			( WPARAM wParam, LPARAM lParam );
+	int								HandleCreate			( WPARAM wparam, LPARAM lparam );
+	int								HandleActivate			( WPARAM wparam, LPARAM lparam );
+	int								HandleDrawItem			( WPARAM wparam, LPARAM lparam );
+	void							HandleTooltipGetDispInfo( WPARAM wparam, LPARAM lparam );
+
+	void							ResizeImageList				( int& widthOut, int& heightOut);
+	static LRESULT					CALLBACK WndProc			( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
+	static LRESULT					CALLBACK MarginWndProc		( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
+	static LRESULT					CALLBACK ScriptWndProc		( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
+	static INT_PTR					CALLBACK AboutDlgProc		( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
+	static int						CALLBACK ScriptWordBreakProc( LPTSTR text, int current, int max, int action );
+
+	bool							FindPrev			( const char* text = NULL );
+	bool							FindNext			( const char* text = NULL );
+
+	void							UpdateBreakpointList( void );
+	void							UpdateScriptList	( void );
+	void							UpdateWatch			( void );
+	void							UpdateWindowMenu	( void );
+	void							UpdateScript		( void );
+	void							UpdateToolbar		( void );
+	void							UpdateTitle			( void );
+	void							UpdateCallstack		( void );
+	void							UpdateRecentFiles	( void );
+	bool							OpenScript			( const char* filename, int lineNumber = -1, idProgram* program = NULL );
+	void							EnableWindows		( bool state );
+
+	int								GetSelectedText		( idStr& text );
+
+	void							ToggleBreakpoint	( void );
+	float							GetMarginWidth      ( void );
 	HWND							mWnd;
 	HWND							mWndScript;
 	HWND							mWndOutput;
@@ -92,7 +111,10 @@ protected:
 	HWND							mWndTabs;
 	HWND							mWndBorder;
 	HWND							mWndConsole;
+	HWND							mWndConsoleInput;
 	HWND							mWndCallstack;
+	HWND							mWndScriptList;
+	HWND							mWndBreakList; // list of breakpoints
 	HWND							mWndWatch;
 	HWND							mWndThreads;
 	HWND							mWndToolTips;
@@ -108,6 +130,7 @@ protected:
 
 	HINSTANCE						mInstance;
 	HIMAGELIST						mImageList;
+	HIMAGELIST						mTmpImageList;
 
 	RECT							mSplitterRect;
 	bool							mSplitterDrag;
@@ -129,25 +152,6 @@ protected:
 	rvDebuggerClient*				mClient;
 
 	rvDebuggerWatchList				mWatches;
-
-private:
-
-	bool		RegisterClass				( void );
-	void		CreateToolbar				( void );
-	bool		InitRecentFiles				( void );
-
-	int			HandleInitMenu				( WPARAM wParam, LPARAM lParam );
-	int			HandleCommand				( WPARAM wParam, LPARAM lParam );
-	int			HandleCreate				( WPARAM wparam, LPARAM lparam );
-	int			HandleActivate				( WPARAM wparam, LPARAM lparam );
-	int			HandleDrawItem				( WPARAM wparam, LPARAM lparam );
-	void		HandleTooltipGetDispInfo	( WPARAM wparam, LPARAM lparam );
-
-	static LRESULT CALLBACK WndProc				( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
-	static LRESULT CALLBACK MarginWndProc		( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
-	static LRESULT CALLBACK ScriptWndProc		( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
-	static INT_PTR CALLBACK AboutDlgProc		( HWND wnd, UINT msg, WPARAM wparam, LPARAM lparam );
-	static int     CALLBACK ScriptWordBreakProc ( LPTSTR text, int current, int max, int action );
 };
 
 /*
diff --git a/neo/tools/debugger/debugger.cpp b/neo/tools/debugger/debugger.cpp
index 4e5d5f6..3bb9043 100644
--- a/neo/tools/debugger/debugger.cpp
+++ b/neo/tools/debugger/debugger.cpp
@@ -26,26 +26,26 @@ If you have questions concerning this license or the applicable additional terms
 ===========================================================================
 */
 
+#if defined( ID_ALLOW_TOOLS )
 #include "tools/edit_gui_common.h"
-
-
 #include "../../sys/win32/rc/debugger_resource.h"
 #include "DebuggerApp.h"
-#include "DebuggerServer.h"
+#else
+#include "debugger_common.h"
+#endif
 
-DWORD CALLBACK DebuggerThread ( LPVOID param );
+#include "DebuggerServer.h"
 
-rvDebuggerApp					gDebuggerApp;
-HWND							gDebuggerWindow = NULL;
-bool							gDebuggerSuspend = false;
-bool							gDebuggerConnnected = false;
-HANDLE							gDebuggerGameThread = NULL;
+#if defined( ID_ALLOW_TOOLS )
+rvDebuggerApp					gDebuggerApp; // this is also used in other source files
+static HWND						gDebuggerWindow = NULL;
+#endif
 
-rvDebuggerServer*				gDebuggerServer			= NULL;
-HANDLE							gDebuggerServerThread   = NULL;
-DWORD							gDebuggerServerThreadID = 0;
-bool							gDebuggerServerQuit     = false;
+static rvDebuggerServer*		gDebuggerServer			= NULL;
+static SDL_Thread*				gDebuggerServerThread   = NULL;
+static bool						gDebuggerServerQuit     = false;
 
+#if defined( ID_ALLOW_TOOLS )
 /*
 ================
 DebuggerMain
@@ -65,6 +65,9 @@ void DebuggerClientInit( const char *cmdline )
 	{
 		goto DebuggerClientInitDone;
 	}
+	
+	// hide the doom window by default
+	::ShowWindow( win32.hWnd, SW_HIDE );
 
 	gDebuggerApp.Run ( );
 
@@ -113,6 +116,7 @@ void DebuggerClientLaunch ( void )
 	CloseHandle ( process.hThread );
 	CloseHandle ( process.hProcess );
 }
+#endif // #if defined( ID_ALLOW_TOOLS )
 
 /*
 ================
@@ -121,14 +125,14 @@ DebuggerServerThread
 Thread proc for the debugger server
 ================
 */
-DWORD CALLBACK DebuggerServerThread ( LPVOID param )
+static int SDLCALL DebuggerServerThread ( void *param )
 {
 	assert ( gDebuggerServer );
 
 	while ( !gDebuggerServerQuit )
 	{
 		gDebuggerServer->ProcessMessages ( );
-		Sleep ( 1 );
+		SDL_Delay( 1 );
 	}
 
 	return 0;
@@ -143,8 +147,17 @@ Starts up the debugger server
 */
 bool DebuggerServerInit ( void )
 {
+	com_enableDebuggerServer.ClearModified( );
+
+	if ( !com_debuggerSupported )
+	{
+		common->Warning( "Called DebuggerServerInit() without the gameDLL supporting it!\n" );
+		return false;
+	}
+
 	// Dont do this if we are in the debugger already
-	if ( com_editors & EDITOR_DEBUGGER )
+	if ( gDebuggerServer != NULL 
+		|| ( com_editors & EDITOR_DEBUGGER ) )
 	{
 		return false;
 	}
@@ -163,9 +176,13 @@ bool DebuggerServerInit ( void )
 		gDebuggerServer = NULL;
 		return false;
 	}
-
+	
 	// Start the debugger server thread
-	gDebuggerServerThread = CreateThread ( NULL, 0, DebuggerServerThread, 0, 0, &gDebuggerServerThreadID );
+#if SDL_VERSION_ATLEAST(2, 0, 0)
+	gDebuggerServerThread = SDL_CreateThread( DebuggerServerThread, "DebuggerServer", NULL );
+#else // SDL 1.2
+	gDebuggerServerThread = SDL_CreateThread( DebuggerServerThread, NULL );
+#endif
 
 	return true;
 }
@@ -179,13 +196,14 @@ Shuts down the debugger server
 */
 void DebuggerServerShutdown ( void )
 {
-	if ( gDebuggerServerThread )
+	if ( gDebuggerServerThread != NULL )
 	{
 		// Signal the debugger server to quit
 		gDebuggerServerQuit = true;
 
 		// Wait for the thread to finish
-		WaitForSingleObject ( gDebuggerServerThread, INFINITE );
+		SDL_WaitThread( gDebuggerServerThread, NULL );
+		gDebuggerServerThread = NULL;
 
 		// Shutdown the server now
 		gDebuggerServer->Shutdown();
@@ -193,10 +211,10 @@ void DebuggerServerShutdown ( void )
 		delete gDebuggerServer;
 		gDebuggerServer = NULL;
 
-		// Cleanup the thread handle
-		CloseHandle ( gDebuggerServerThread );
-		gDebuggerServerThread = NULL;
+		com_editors &= ~EDITOR_DEBUGGER;
 	}
+
+	com_enableDebuggerServer.ClearModified( );
 }
 
 /*
diff --git a/neo/tools/debugger/debugger_common.h b/neo/tools/debugger/debugger_common.h
new file mode 100644
index 0000000..e98962c
--- /dev/null
+++ b/neo/tools/debugger/debugger_common.h
@@ -0,0 +1,162 @@
+// header that includes all the other needed headers, replacement for precompiled.h (only used by debugger)
+// this could be cleaned up more.
+
+#ifndef DEBUGGER_COMMON_H
+#define DEBUGGER_COMMON_H
+
+#include "framework/Game.h"
+
+// non-portable system services
+#include "sys/platform.h"
+#include "sys/sys_public.h"
+
+// id lib
+#include "idlib/Lib.h"
+
+// memory management and arrays
+#include "idlib/Heap.h"
+#include "idlib/containers/List.h"
+
+// math
+#include "idlib/math/Simd.h"
+#include "idlib/math/Math.h"
+#include "idlib/math/Random.h"
+#include "idlib/math/Complex.h"
+#include "idlib/math/Vector.h"
+#include "idlib/math/Matrix.h"
+#include "idlib/math/Angles.h"
+#include "idlib/math/Quat.h"
+#include "idlib/math/Rotation.h"
+#include "idlib/math/Plane.h"
+#include "idlib/math/Pluecker.h"
+#include "idlib/math/Polynomial.h"
+#include "idlib/math/Extrapolate.h"
+#include "idlib/math/Interpolate.h"
+#include "idlib/math/Curve.h"
+#include "idlib/math/Ode.h"
+#include "idlib/math/Lcp.h"
+
+// bounding volumes
+#include "idlib/bv/Sphere.h"
+#include "idlib/bv/Bounds.h"
+#include "idlib/bv/Box.h"
+#include "idlib/bv/Frustum.h"
+
+// geometry
+#include "idlib/geometry/DrawVert.h"
+#include "idlib/geometry/JointTransform.h"
+#include "idlib/geometry/Winding.h"
+#include "idlib/geometry/Winding2D.h"
+#include "idlib/geometry/Surface.h"
+#include "idlib/geometry/Surface_Patch.h"
+#include "idlib/geometry/Surface_Polytope.h"
+#include "idlib/geometry/Surface_SweptSpline.h"
+#include "idlib/geometry/TraceModel.h"
+
+// text manipulation
+#include "idlib/Str.h"
+#include "idlib/Token.h"
+#include "idlib/Lexer.h"
+#include "idlib/Parser.h"
+#include "idlib/Base64.h"
+#include "idlib/CmdArgs.h"
+
+// containers
+#include "idlib/containers/BTree.h"
+#include "idlib/containers/BinSearch.h"
+#include "idlib/containers/HashIndex.h"
+#include "idlib/containers/HashTable.h"
+#include "idlib/containers/StaticList.h"
+#include "idlib/containers/LinkList.h"
+#include "idlib/containers/Hierarchy.h"
+#include "idlib/containers/Queue.h"
+#include "idlib/containers/Stack.h"
+#include "idlib/containers/StrList.h"
+#include "idlib/containers/StrPool.h"
+#include "idlib/containers/VectorSet.h"
+#include "idlib/containers/PlaneSet.h"
+
+// hashing
+#include "idlib/hashing/CRC32.h"
+#include "idlib/hashing/MD4.h"
+#include "idlib/hashing/MD5.h"
+
+// misc
+#include "idlib/Dict.h"
+#include "idlib/LangDict.h"
+#include "idlib/BitMsg.h"
+#include "idlib/MapFile.h"
+#include "idlib/Timer.h"
+
+// framework
+#include "framework/BuildVersion.h"
+#include "framework/Licensee.h"
+#include "framework/CmdSystem.h"
+#include "framework/CVarSystem.h"
+#include "framework/Common.h"
+#include "framework/File.h"
+#include "framework/FileSystem.h"
+#include "framework/UsercmdGen.h"
+
+// decls
+#include "framework/DeclManager.h"
+#include "framework/DeclTable.h"
+#include "framework/DeclSkin.h"
+#include "framework/DeclEntityDef.h"
+#include "framework/DeclFX.h"
+#include "framework/DeclParticle.h"
+#include "framework/DeclAF.h"
+#include "framework/DeclPDA.h"
+
+// We have expression parsing and evaluation code in multiple places:
+// materials, sound shaders, and guis. We should unify them.
+
+// renderer
+#include "renderer/qgl.h"
+#include "renderer/Cinematic.h"
+#include "renderer/Material.h"
+#include "renderer/Model.h"
+#include "renderer/ModelManager.h"
+#include "renderer/RenderSystem.h"
+#include "renderer/RenderWorld.h"
+
+// sound engine
+#include "sound/sound.h"
+
+// asynchronous networking
+#include "framework/async/NetworkSystem.h"
+
+// user interfaces
+#include "ui/ListGUI.h"
+#include "ui/UserInterface.h"
+
+// collision detection system
+#include "cm/CollisionModel.h"
+
+// AAS files and manager
+#include "tools/compilers/aas/AASFile.h"
+#include "tools/compilers/aas/AASFileManager.h"
+
+// game interface
+#include "framework/Game.h"
+
+//-----------------------------------------------------
+
+#include "framework/DemoChecksum.h"
+
+// framework
+#include "framework/Compressor.h"
+#include "framework/EventLoop.h"
+#include "framework/KeyInput.h"
+#include "framework/EditField.h"
+#include "framework/Console.h"
+#include "framework/DemoFile.h"
+#include "framework/Session.h"
+
+// asynchronous networking
+#include "framework/async/AsyncNetwork.h"
+
+// Compilers for map, model, video etc. processing.
+#include "tools/compilers/compiler_public.h"
+
+#endif // DEBUGGER_COMMON_H
diff --git a/neo/tools/decl/DialogDeclBrowser.cpp b/neo/tools/decl/DialogDeclBrowser.cpp
index 7186ca3..e66a9bc 100644
--- a/neo/tools/decl/DialogDeclBrowser.cpp
+++ b/neo/tools/decl/DialogDeclBrowser.cpp
@@ -559,8 +559,11 @@ DialogDeclBrowser::OnToolTipNotify
 */
 BOOL DialogDeclBrowser::OnToolTipNotify( UINT id, NMHDR *pNMHDR, LRESULT *pResult ) {
 	// need to handle both ANSI and UNICODE versions of the message
+#ifdef _UNICODE
 	TOOLTIPTEXTA* pTTTA = (TOOLTIPTEXTA*)pNMHDR;
+#else
 	TOOLTIPTEXTW* pTTTW = (TOOLTIPTEXTW*)pNMHDR;
+#endif
 
 	if ( pNMHDR->hwndFrom == declTree.GetSafeHwnd() ) {
 		CString toolTip;
@@ -656,43 +659,49 @@ void DialogDeclBrowser::OnSize( UINT nType, int cx, int cy ) {
 
 	GetClientRect( clientRect );
 
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	float scaled_toolbar_height = (TOOLBAR_HEIGHT * scaling_factor);
+	float scaled_button_space = (BUTTON_SPACE * scaling_factor);
+	float scaled_border_size = (BORDER_SIZE * scaling_factor);
+
+
 	if ( declTree.GetSafeHwnd() ) {
-		rect.left = BORDER_SIZE;
-		rect.top = BORDER_SIZE;
-		rect.right = clientRect.Width() - BORDER_SIZE;
-		rect.bottom = clientRect.Height() - 100;
+		rect.left = scaled_border_size;
+		rect.top = scaled_border_size;
+		rect.right = clientRect.Width() - scaled_border_size;
+		rect.bottom = clientRect.Height() - (100 * scaling_factor);
 		declTree.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
 	if ( findNameStatic.GetSafeHwnd() ) {
-		rect.left = BORDER_SIZE + 2;
-		rect.top = clientRect.Height() - 100 + BUTTON_SPACE + 2;
-		rect.right = BORDER_SIZE + 80;
-		rect.bottom = clientRect.Height() - 76 + 2;
+		rect.left = scaled_border_size + (2 * scaling_factor);
+		rect.top = clientRect.Height() - (98 * scaling_factor) + scaled_button_space;
+		rect.right = scaled_border_size + (80 * scaling_factor);
+		rect.bottom = clientRect.Height() - (74 * scaling_factor);
 		findNameStatic.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
 	if ( findTextStatic.GetSafeHwnd() ) {
-		rect.left = BORDER_SIZE + 2;
-		rect.top = clientRect.Height() - 78 + BUTTON_SPACE + 2;
-		rect.right = BORDER_SIZE + 80;
-		rect.bottom = clientRect.Height() - 54 + 2;
+		rect.left = scaled_border_size + (2 * scaling_factor);
+		rect.top = clientRect.Height() - (76 * scaling_factor) + scaled_button_space;
+		rect.right = scaled_border_size + (80 * scaling_factor);
+		rect.bottom = clientRect.Height() - (52 * scaling_factor);
 		findTextStatic.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
 	if ( findNameEdit.GetSafeHwnd() ) {
-		rect.left = BORDER_SIZE + 80;
-		rect.top = clientRect.Height() - 100 + BUTTON_SPACE;
-		rect.right = clientRect.Width() - BORDER_SIZE;
-		rect.bottom = clientRect.Height() - 76;
+		rect.left = scaled_border_size + (80 * scaling_factor);
+		rect.top = clientRect.Height() - (100 * scaling_factor) + scaled_button_space;
+		rect.right = clientRect.Width() - scaled_border_size;
+		rect.bottom = clientRect.Height() - (76 * scaling_factor);
 		findNameEdit.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
 	if ( findTextEdit.GetSafeHwnd() ) {
-		rect.left = BORDER_SIZE + 80;
-		rect.top = clientRect.Height() - 78 + BUTTON_SPACE;
-		rect.right = clientRect.Width() - BORDER_SIZE;
-		rect.bottom = clientRect.Height() - 54;
+		rect.left = scaled_border_size + (80 * scaling_factor);
+		rect.top = clientRect.Height() - (78 * scaling_factor) + scaled_button_space;
+		rect.right = clientRect.Width() - scaled_border_size;
+		rect.bottom = clientRect.Height() - (54 * scaling_factor);
 		findTextEdit.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -700,10 +709,10 @@ void DialogDeclBrowser::OnSize( UINT nType, int cx, int cy ) {
 		findButton.GetClientRect( rect );
 		int width = rect.Width();
 		int height = rect.Height();
-		rect.left = BORDER_SIZE;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
-		rect.right = BORDER_SIZE + width;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.left = scaled_border_size;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
+		rect.right = scaled_border_size + width;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		findButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -711,10 +720,10 @@ void DialogDeclBrowser::OnSize( UINT nType, int cx, int cy ) {
 		editButton.GetClientRect( rect );
 		int width = rect.Width();
 		int height = rect.Height();
-		rect.left = BORDER_SIZE + BUTTON_SPACE + width;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
-		rect.right = BORDER_SIZE + BUTTON_SPACE + 2 * width;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.left = scaled_border_size + scaled_button_space + width;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
+		rect.right = scaled_border_size + scaled_button_space + 2 * width;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		editButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -722,10 +731,10 @@ void DialogDeclBrowser::OnSize( UINT nType, int cx, int cy ) {
 		newButton.GetClientRect( rect );
 		int width = rect.Width();
 		int height = rect.Height();
-		rect.left = BORDER_SIZE + 2 * BUTTON_SPACE + 2 * width;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
-		rect.right = BORDER_SIZE + 2 * BUTTON_SPACE + 3 * width;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.left = scaled_border_size + 2 * scaled_button_space + 2 * width;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
+		rect.right = scaled_border_size + 2 * scaled_button_space + 3 * width;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		newButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -733,10 +742,10 @@ void DialogDeclBrowser::OnSize( UINT nType, int cx, int cy ) {
 		reloadButton.GetClientRect( rect );
 		int width = rect.Width();
 		int height = rect.Height();
-		rect.left = BORDER_SIZE + 3 * BUTTON_SPACE + 3 * width;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
-		rect.right = BORDER_SIZE + 3 * BUTTON_SPACE + 4 * width;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.left = scaled_border_size + 3 * scaled_button_space + 3 * width;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
+		rect.right = scaled_border_size + 3 * scaled_button_space + 4 * width;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		reloadButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -744,18 +753,18 @@ void DialogDeclBrowser::OnSize( UINT nType, int cx, int cy ) {
 		cancelButton.GetClientRect( rect );
 		int width = rect.Width();
 		int height = rect.Height();
-		rect.left = clientRect.Width() - BORDER_SIZE - width;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
-		rect.right = clientRect.Width() - BORDER_SIZE;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.left = clientRect.Width() - scaled_border_size - width;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
+		rect.right = clientRect.Width() - scaled_border_size;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		cancelButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
 	if ( statusBar.GetSafeHwnd() ) {
-		rect.left = clientRect.Width() - 2;
-		rect.top = clientRect.Height() - 2;
-		rect.right = clientRect.Width() - 2;
-		rect.bottom = clientRect.Height() - 2;
+		rect.left = clientRect.Width() - (2 * scaling_factor);
+		rect.top = clientRect.Height() - (2 * scaling_factor);
+		rect.right = clientRect.Width() - (2 * scaling_factor);
+		rect.bottom = clientRect.Height() - (2 * scaling_factor);
 		statusBar.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -807,7 +816,6 @@ DialogDeclBrowser::OnTreeSelChanged
 ================
 */
 void DialogDeclBrowser::OnTreeSelChanged( NMHDR* pNMHDR, LRESULT* pResult ) {
-	LV_KEYDOWN* pLVKeyDow = (LV_KEYDOWN*)pNMHDR;
 
 	const idDecl *decl = GetSelectedDecl();
 	if ( decl ) {
diff --git a/neo/tools/decl/DialogDeclEditor.cpp b/neo/tools/decl/DialogDeclEditor.cpp
index b20a72d..cef1c29 100644
--- a/neo/tools/decl/DialogDeclEditor.cpp
+++ b/neo/tools/decl/DialogDeclEditor.cpp
@@ -234,20 +234,22 @@ void DialogDeclEditor::LoadDecl( idDecl *decl ) {
 	}
 
 	SetWindowText( va( "Declaration Editor (%s, line %d)", decl->GetFileName(), decl->GetLineNum() ) );
+	
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
 
 	rect.left = initialRect.left;
-	rect.right = rect.left + maxCharsPerLine * FONT_WIDTH + 32;
+	rect.right = rect.left + (maxCharsPerLine * FONT_WIDTH + 32) *scaling_factor;
 	rect.top = initialRect.top;
-	rect.bottom = rect.top + numLines * (FONT_HEIGHT+8) + 24 + 56;
+	rect.bottom = rect.top + (numLines * (FONT_HEIGHT+8) + 24 + 56)* scaling_factor;
 	if ( rect.right < initialRect.right ) {
 		rect.right = initialRect.right;
-	} else if ( rect.right - rect.left > 1024 ) {
-		rect.right = rect.left + 1024;
+	} else if ( rect.right - rect.left > (1024 * scaling_factor) ) {
+		rect.right = rect.left + (1024 * scaling_factor);
 	}
 	if ( rect.bottom < initialRect.bottom ) {
 		rect.bottom = initialRect.bottom;
-	} else if ( rect.bottom - rect.top > 768 ) {
-		rect.bottom = rect.top + 768;
+	} else if ( rect.bottom - rect.top > (768 * scaling_factor)  ) {
+		rect.bottom = rect.top + (768 * scaling_factor);
 	}
 	MoveWindow( rect );
 
@@ -383,12 +385,15 @@ void DialogDeclEditor::OnSize( UINT nType, int cx, int cy ) {
 	CDialog::OnSize( nType, cx, cy );
 
 	GetClientRect( clientRect );
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	float scaled_toolbar_height = (TOOLBAR_HEIGHT * scaling_factor);
+	float scaled_button_space = (BUTTON_SPACE * scaling_factor);
 
 	if ( declEdit.GetSafeHwnd() ) {
 		rect.left = BORDER_SIZE;
 		rect.top = BORDER_SIZE;
 		rect.right = clientRect.Width() - BORDER_SIZE;
-		rect.bottom = clientRect.Height() - 56;
+		rect.bottom = clientRect.Height() - (56 * scaling_factor);
 		declEdit.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -397,9 +402,9 @@ void DialogDeclEditor::OnSize( UINT nType, int cx, int cy ) {
 		int width = rect.Width();
 		int height = rect.Height();
 		rect.left = BORDER_SIZE;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
 		rect.right = BORDER_SIZE + width;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		testButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -407,10 +412,10 @@ void DialogDeclEditor::OnSize( UINT nType, int cx, int cy ) {
 		okButton.GetClientRect( rect );
 		int width = rect.Width();
 		int height = rect.Height();
-		rect.left = clientRect.Width() - BORDER_SIZE - BUTTON_SPACE - 2 * width;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
-		rect.right = clientRect.Width() - BORDER_SIZE - BUTTON_SPACE - width;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.left = clientRect.Width() - BORDER_SIZE - scaled_button_space - 2 * width;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
+		rect.right = clientRect.Width() - BORDER_SIZE - scaled_button_space - width;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		okButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
@@ -419,9 +424,9 @@ void DialogDeclEditor::OnSize( UINT nType, int cx, int cy ) {
 		int width = rect.Width();
 		int height = rect.Height();
 		rect.left = clientRect.Width() - BORDER_SIZE - width;
-		rect.top = clientRect.Height() - TOOLBAR_HEIGHT - height;
+		rect.top = clientRect.Height() - scaled_toolbar_height - height;
 		rect.right = clientRect.Width() - BORDER_SIZE;
-		rect.bottom = clientRect.Height() - TOOLBAR_HEIGHT;
+		rect.bottom = clientRect.Height() - scaled_toolbar_height;
 		cancelButton.MoveWindow( rect.left, rect.top, rect.Width(), rect.Height() );
 	}
 
diff --git a/neo/tools/edit_gui_common.h b/neo/tools/edit_gui_common.h
index 48b43c1..77078b8 100644
--- a/neo/tools/edit_gui_common.h
+++ b/neo/tools/edit_gui_common.h
@@ -179,4 +179,6 @@
 // Compilers for map, model, video etc. processing.
 #include "tools/compilers/compiler_public.h"
 
+// scaling factor based on DPI (dpi/96.0f, so 1.0 by default); implemented in win_main.cpp
+float Win_GetWindowScalingFactor(HWND window);
 #endif // TOOLS_EDIT_GUI_COMMON_H
diff --git a/neo/tools/edit_stub.cpp b/neo/tools/edit_stub.cpp
index 79c2c88..34d99e1 100644
--- a/neo/tools/edit_stub.cpp
+++ b/neo/tools/edit_stub.cpp
@@ -67,10 +67,6 @@ bool	GUIEditorHandleMessage( void *msg ) { return false; }
 
 void	DebuggerClientLaunch( void ) {}
 void	DebuggerClientInit( const char *cmdline ) { common->Printf( "The Script Debugger Client only runs on Win32\n" ); }
-bool	DebuggerServerInit( void ) { return false; }
-void	DebuggerServerShutdown( void ) {}
-void	DebuggerServerPrint( const char *text ) {}
-void	DebuggerServerCheckBreakpoint( idInterpreter *interpreter, idProgram *program, int instructionPointer ) {}
 
 void	PDAEditorInit( const idDict *spawnArgs ) { common->Printf( "The PDA editor only runs on Win32\n" ); }
 
diff --git a/neo/tools/guied/GEApp.cpp b/neo/tools/guied/GEApp.cpp
index 440d307..9aae402 100644
--- a/neo/tools/guied/GEApp.cpp
+++ b/neo/tools/guied/GEApp.cpp
@@ -123,7 +123,7 @@ bool rvGEApp::Initialize ( void )
 		return false;
 	}
 
-	SetClassLong( mMDIFrame, GCL_HICON, ( LONG )LoadIcon( win32.hInstance, MAKEINTRESOURCE( IDI_GUIED ) ) );
+	SetClassLongPtr( mMDIFrame, GCLP_HICON, (LONG_PTR)LoadIcon( win32.hInstance, MAKEINTRESOURCE( IDI_GUIED ) ) );
 
 	// Create the MDI window
 	CLIENTCREATESTRUCT ccs;
@@ -263,7 +263,7 @@ Main frame window procedure
 */
 LRESULT CALLBACK rvGEApp::FrameWndProc ( HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam )
 {
-	rvGEApp* app = (rvGEApp*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGEApp* app = (rvGEApp*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	switch ( uMsg )
 	{
@@ -348,7 +348,7 @@ LRESULT CALLBACK rvGEApp::FrameWndProc ( HWND hWnd, UINT uMsg, WPARAM wParam, LP
 
 			assert ( app );
 
-			SetWindowLong ( hWnd, GWL_USERDATA, (LONG)app );
+			SetWindowLongPtr( hWnd, GWLP_USERDATA, (LONG_PTR)app );
 
 			app->mMDIFrame = hWnd;
 
@@ -365,9 +365,9 @@ LRESULT CALLBACK rvGEApp::FrameWndProc ( HWND hWnd, UINT uMsg, WPARAM wParam, LP
 			app->mToolWindows.Append ( app->mProperties.GetWindow ( ) );
 			app->mToolWindows.Append ( app->mTransformer.GetWindow ( ) );
 
-			SendMessage ( app->mNavigator.GetWindow ( ), WM_NCACTIVATE, true, (LONG)-1 );
-			SendMessage ( app->mProperties.GetWindow ( ), WM_NCACTIVATE, true, (LONG)-1 );
-			SendMessage ( app->mTransformer.GetWindow ( ), WM_NCACTIVATE, true, (LONG)-1 );
+			SendMessage ( app->mNavigator.GetWindow ( ), WM_NCACTIVATE, true, (LONG_PTR)-1 );
+			SendMessage ( app->mProperties.GetWindow ( ), WM_NCACTIVATE, true, (LONG_PTR)-1 );
+			SendMessage ( app->mTransformer.GetWindow ( ), WM_NCACTIVATE, true, (LONG_PTR)-1 );
 
 			break;
 		}
@@ -385,7 +385,7 @@ MDI Child window procedure
 */
 LRESULT CALLBACK rvGEApp::MDIChildProc ( HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam )
 {
-	rvGEWorkspace* workspace = (rvGEWorkspace*)GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGEWorkspace* workspace = (rvGEWorkspace*)GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	// Give the active workspace a chance to play with it
 	if ( workspace )
@@ -572,10 +572,10 @@ int rvGEApp::HandleCommand ( WPARAM wParam, LPARAM lParam )
 			break;
 
 		case ID_GUIED_TOOLS_RELOADMATERIALS:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_WAIT) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_WAIT ) );
 			cmdSystem->BufferCommandText( CMD_EXEC_NOW, "reloadImages\n" );
 			cmdSystem->BufferCommandText( CMD_EXEC_NOW, "reloadMaterials\n" );
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 			break;
 
 		case ID_GUIED_EDIT_COPY:
@@ -1159,7 +1159,7 @@ bool rvGEApp::NewFile ( void )
 								480,
 								mMDIClient,
 								mInstance,
-								(LONG)workspace );
+								(LONG_PTR)workspace );
 
 		ShowWindow ( child, SW_SHOW );
 	}
@@ -1191,7 +1191,7 @@ bool rvGEApp::OpenFile ( const char* filename )
 		}
 	}
 
-	SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_WAIT ) ) );
+	SetCursor ( LoadCursor ( NULL, IDC_WAIT ) );
 
 	// Setup the default error.
 	error = va("Failed to parse '%s'", filename );
@@ -1210,7 +1210,7 @@ bool rvGEApp::OpenFile ( const char* filename )
 								480,
 								mMDIClient,
 								mInstance,
-								(LONG)workspace );
+								(LONG_PTR)workspace );
 
 		ShowWindow ( child, SW_SHOW );
 
@@ -1224,7 +1224,7 @@ bool rvGEApp::OpenFile ( const char* filename )
 		MessageBox ( error, MB_OK|MB_ICONERROR );
 	}
 
-	SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW ) ) );
+	SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 
 	return result;;
 }
@@ -1362,7 +1362,7 @@ int	rvGEApp::ToolWindowActivate ( HWND hwnd, UINT msg, WPARAM wParam, LPARAM lPa
 		{
 			if ( mToolWindows[i] != hwnd &&	mToolWindows[i] != (HWND) lParam )
 			{
-				SendMessage ( mToolWindows[i], WM_NCACTIVATE, keepActive, (LONG)-1 );
+				SendMessage ( mToolWindows[i], WM_NCACTIVATE, keepActive, (LONG_PTR)-1 );
 			}
 		}
 	}
diff --git a/neo/tools/guied/GECheckInDlg.cpp b/neo/tools/guied/GECheckInDlg.cpp
index 1fd627b..5b540a3 100644
--- a/neo/tools/guied/GECheckInDlg.cpp
+++ b/neo/tools/guied/GECheckInDlg.cpp
@@ -49,12 +49,12 @@ Dialog procedure for the check in dialog
 */
 static INT_PTR CALLBACK GECheckInDlg_GeneralProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	GECHECKINDLG* dlg = (GECHECKINDLG*) GetWindowLong ( hwnd, GWL_USERDATA );
+	GECHECKINDLG* dlg = (GECHECKINDLG*) GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
 		case WM_INITDIALOG:
-			SetWindowLong ( hwnd, GWL_USERDATA, lParam );
+			SetWindowLongPtr ( hwnd, GWLP_USERDATA, lParam );
 			dlg = (GECHECKINDLG*) lParam;
 
 			SetWindowText ( GetDlgItem ( hwnd, IDC_GUIED_FILENAME ), dlg->mFilename );
diff --git a/neo/tools/guied/GEItemPropsDlg.cpp b/neo/tools/guied/GEItemPropsDlg.cpp
index b717f67..e06f75d 100644
--- a/neo/tools/guied/GEItemPropsDlg.cpp
+++ b/neo/tools/guied/GEItemPropsDlg.cpp
@@ -412,7 +412,7 @@ bool rvGEItemPropsTextPage::Init ( void )
 	SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTALIGN ), CB_ADDSTRING, 0, (LPARAM)"Center" );
 	SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTALIGN ), CB_ADDSTRING, 0, (LPARAM)"Right" );
 
-	SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_ADDSTRING, 0, (LONG)"<default>" );
+	SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_ADDSTRING, 0, (LONG_PTR)"<default>" );
 
 	idFileList *folders;
 	int		  i;
@@ -424,7 +424,7 @@ bool rvGEItemPropsTextPage::Init ( void )
 			continue;
 		}
 
-		SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_ADDSTRING, 0, (LONG)folders->GetFile(i) );
+		SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_ADDSTRING, 0, (LONG_PTR)folders->GetFile(i) );
 	}
 
 	fileSystem->FreeFileList( folders );
@@ -547,7 +547,7 @@ bool rvGEItemPropsTextPage::SetActive ( void )
 	int   fontSel;
 	font.StripQuotes ( );
 	font.StripPath ( );
-	fontSel = SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_FINDSTRING, -1, (LONG)font.c_str () );
+	fontSel = SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_FINDSTRING, -1, (LONG_PTR)font.c_str () );
 	SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_SETCURSEL, 0, 0 );
 	SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_SETCURSEL, fontSel==-1?0:fontSel, 0 );
 
@@ -633,7 +633,7 @@ bool rvGEItemPropsTextPage::KillActive ( void )
 		else
 		{
 			char fontName[MAX_PATH];
-			SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_GETLBTEXT, fontSel, (LONG)fontName );
+			SendMessage ( GetDlgItem ( mPage, IDC_GUIED_ITEMTEXTFONT ), CB_GETLBTEXT, fontSel, (LONG_PTR)fontName );
 			mDict->Set ( "font", idStr("\"fonts/") + idStr(fontName) + idStr("\"" ) );
 		}
 
@@ -706,7 +706,7 @@ INT_PTR CALLBACK ModifyItemKeyDlg_WndProc ( HWND hwnd, UINT msg, WPARAM wParam,
 				SetWindowText ( hwnd, "New Item Key" );
 			}
 
-			SetWindowLong ( hwnd, GWL_USERDATA, lParam );
+			SetWindowLongPtr ( hwnd, GWLP_USERDATA, lParam );
 			return FALSE;
 		}
 
@@ -718,7 +718,7 @@ INT_PTR CALLBACK ModifyItemKeyDlg_WndProc ( HWND hwnd, UINT msg, WPARAM wParam,
 					char key[1024];
 					char value[1024];
 
-					const idKeyValue* keyValue = (const idKeyValue*) GetWindowLong ( hwnd, GWL_USERDATA );
+					const idKeyValue* keyValue = (const idKeyValue*) GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 
 					GetWindowText ( GetDlgItem ( hwnd, IDC_GUIED_ITEMKEY ), key, 1024 );
 					GetWindowText ( GetDlgItem ( hwnd, IDC_GUIED_ITEMVALUE ), value, 1024 );
@@ -816,7 +816,7 @@ int rvGEItemPropsKeysPage::HandleMessage ( UINT msg, WPARAM wParam, LPARAM lPara
 							item.mask = LVIF_TEXT|LVIF_PARAM;
 							item.iItem = ListView_GetItemCount ( list );
 							item.pszText = (LPSTR)key->GetKey().c_str ( );
-							item.lParam = (LONG) key;
+							item.lParam = (LONG_PTR) key;
 							int index = ListView_InsertItem ( list, &item );
 
 							finalValue.StripQuotes ( );
@@ -965,25 +965,28 @@ bool rvGEItemPropsKeysPage::SetActive ( void )
 	// Delete anything already in there
 	ListView_DeleteAllItems ( list );
 
-	// Add each key in the properties dictionary
-	for ( i = 0; i < mDict->GetNumKeyVals(); i ++ )
+	if ( mDict != NULL )
 	{
-		const idKeyValue* key = mDict->GetKeyVal ( i );
-		assert ( key );
-
-		// Add the item
-		LVITEM item;
-		ZeroMemory ( &item, sizeof(item) );
-		item.mask = LVIF_TEXT|LVIF_PARAM;
-		item.iItem = ListView_GetItemCount ( list );
-		item.pszText = (LPSTR)key->GetKey().c_str ( );
-		item.lParam = (LONG) key;
-		int index = ListView_InsertItem ( list, &item );
-
-		idStr value;
-		value = key->GetValue();
-		value.StripQuotes ( );
-		ListView_SetItemText ( list, index, 1, (LPSTR)value.c_str() );
+		// Add each key in the properties dictionary
+		for ( i = 0; i < mDict->GetNumKeyVals(); i ++ )
+		{
+			const idKeyValue* key = mDict->GetKeyVal ( i );
+			assert ( key );
+	
+			// Add the item
+			LVITEM item;
+			ZeroMemory ( &item, sizeof(item) );
+			item.mask = LVIF_TEXT|LVIF_PARAM;
+			item.iItem = ListView_GetItemCount ( list );
+			item.pszText = (LPSTR)key->GetKey().c_str ( );
+			item.lParam = (LONG_PTR) key;
+			int index = ListView_InsertItem ( list, &item );
+	
+			idStr value;
+			value = key->GetValue();
+			value.StripQuotes ( );
+			ListView_SetItemText ( list, index, 1, (LPSTR)value.c_str() );
+		}
 	}
 
 	return true;
@@ -1077,6 +1080,11 @@ bool rvGEItemPropsGeneralPage::SetActive ( void )
 
 	gApp.GetOptions().SetLastOptionsPage ( RVITEMPROPS_GENERAL );
 
+	if ( mDict == NULL )
+	{
+		return false;
+	}
+	
 	SetWindowText ( GetDlgItem ( mPage, IDC_GUIED_ITEMNAME ), idStr(mDict->GetString ( "name", "unnamed" )).StripQuotes ( ) );
 
 	enable = !IsExpression ( mDict->GetString ( "visible", "1" ) );
@@ -1116,6 +1124,11 @@ Applys the settings currently stored in the property page back into the attached
 */
 bool rvGEItemPropsGeneralPage::KillActive ( void )
 {
+	if ( mDict == NULL )
+	{
+		return false;
+	}
+	
 	char temp[1024];
 
 	GetWindowText ( GetDlgItem(mPage,IDC_GUIED_ITEMNAME), temp, 1024 );
@@ -1211,7 +1224,7 @@ bool GEItemPropsDlg_DoModal ( HWND parent, idWindow* window, idDict& dict )
 	propsp[RVITEMPROPS_GENERAL].pszTemplate	= MAKEINTRESOURCE(IDD_GUIED_ITEMPROPS_GENERAL);
 	propsp[RVITEMPROPS_GENERAL].pfnDlgProc	= rvGEPropertyPage::WndProc;
 	propsp[RVITEMPROPS_GENERAL].pszTitle	= "General";
-	propsp[RVITEMPROPS_GENERAL].lParam		= (LONG)new rvGEItemPropsGeneralPage ( &dict, wrapper->GetWindowType ( ) );
+	propsp[RVITEMPROPS_GENERAL].lParam		= (LONG_PTR)new rvGEItemPropsGeneralPage ( &dict, wrapper->GetWindowType ( ) );
 
 	propsp[RVITEMPROPS_IMAGE].dwSize		= sizeof(PROPSHEETPAGE);
 	propsp[RVITEMPROPS_IMAGE].dwFlags		= PSP_USETITLE;
@@ -1219,7 +1232,7 @@ bool GEItemPropsDlg_DoModal ( HWND parent, idWindow* window, idDict& dict )
 	propsp[RVITEMPROPS_IMAGE].pszTemplate	= MAKEINTRESOURCE(IDD_GUIED_ITEMPROPS_IMAGE);
 	propsp[RVITEMPROPS_IMAGE].pfnDlgProc	= rvGEPropertyPage::WndProc;
 	propsp[RVITEMPROPS_IMAGE].pszTitle		= "Image";
-	propsp[RVITEMPROPS_IMAGE].lParam		= (LONG)new rvGEItemPropsImagePage ( &dict );;
+	propsp[RVITEMPROPS_IMAGE].lParam		= (LONG_PTR)new rvGEItemPropsImagePage ( &dict );;
 
 	propsp[RVITEMPROPS_TEXT].dwSize			= sizeof(PROPSHEETPAGE);
 	propsp[RVITEMPROPS_TEXT].dwFlags		= PSP_USETITLE;
@@ -1227,7 +1240,7 @@ bool GEItemPropsDlg_DoModal ( HWND parent, idWindow* window, idDict& dict )
 	propsp[RVITEMPROPS_TEXT].pszTemplate	= MAKEINTRESOURCE(IDD_GUIED_ITEMPROPS_TEXT);
 	propsp[RVITEMPROPS_TEXT].pfnDlgProc		= rvGEPropertyPage::WndProc;
 	propsp[RVITEMPROPS_TEXT].pszTitle		= "Text";
-	propsp[RVITEMPROPS_TEXT].lParam			= (LONG)new rvGEItemPropsTextPage ( &dict );;
+	propsp[RVITEMPROPS_TEXT].lParam			= (LONG_PTR)new rvGEItemPropsTextPage ( &dict );;
 
 	propsp[RVITEMPROPS_KEYS].dwSize			= sizeof(PROPSHEETPAGE);
 	propsp[RVITEMPROPS_KEYS].dwFlags		= PSP_USETITLE;
@@ -1235,7 +1248,7 @@ bool GEItemPropsDlg_DoModal ( HWND parent, idWindow* window, idDict& dict )
 	propsp[RVITEMPROPS_KEYS].pszTemplate	= MAKEINTRESOURCE(IDD_GUIED_ITEMPROPS_KEYS);
 	propsp[RVITEMPROPS_KEYS].pfnDlgProc		= rvGEPropertyPage::WndProc;
 	propsp[RVITEMPROPS_KEYS].pszTitle		= "Keys";
-	propsp[RVITEMPROPS_KEYS].lParam			= (LONG)new rvGEItemPropsKeysPage ( &dict, wrapper );
+	propsp[RVITEMPROPS_KEYS].lParam			= (LONG_PTR)new rvGEItemPropsKeysPage ( &dict, wrapper );
 
 	propsh.dwSize			= sizeof(PROPSHEETHEADER);
 	propsh.nStartPage		= gApp.GetOptions().GetLastOptionsPage ( );
diff --git a/neo/tools/guied/GEItemScriptsDlg.cpp b/neo/tools/guied/GEItemScriptsDlg.cpp
index e5832d5..a172f11 100644
--- a/neo/tools/guied/GEItemScriptsDlg.cpp
+++ b/neo/tools/guied/GEItemScriptsDlg.cpp
@@ -36,7 +36,7 @@ If you have questions concerning this license or the applicable additional terms
 
 LRESULT CALLBACK GEScriptEdit_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	WNDPROC wndproc = (WNDPROC) GetWindowLong ( hwnd, GWL_USERDATA );
+	WNDPROC wndproc = (WNDPROC) GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -61,7 +61,7 @@ bool GEItescriptsDlg_Init ( HWND hwnd )
 	HWND				script;
 
 	// Extract the window pointer from the win32 windows user data long
-	window = (idWindow*)GetWindowLong ( hwnd, GWL_USERDATA );
+	window = (idWindow*)GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 	assert ( window );
 
 	// Get the window wrapper of the script window
@@ -73,8 +73,8 @@ bool GEItescriptsDlg_Init ( HWND hwnd )
 
 	UINT tabsize = 16;
 	SendMessage ( script, EM_SETTABSTOPS, 1, (LPARAM)&tabsize );
-	SetWindowLong ( script, GWL_USERDATA, GetWindowLong ( script, GWL_WNDPROC ) );
-	SetWindowLong ( script, GWL_WNDPROC, (LONG) GEScriptEdit_WndProc );
+	SetWindowLongPtr ( script, GWLP_USERDATA, GetWindowLongPtr ( script, GWLP_WNDPROC ) );
+	SetWindowLongPtr ( script, GWLP_WNDPROC, (LONG_PTR) GEScriptEdit_WndProc );
 
 	TEXTMETRIC tm;
 	HDC dc;
@@ -138,7 +138,7 @@ bool GEItescriptsDlg_Apply ( HWND hwnd )
 	HWND				script;
 
 	// Extract the window pointer from the win32 windows user data long
-	window = (idWindow*)GetWindowLong ( hwnd, GWL_USERDATA );
+	window = (idWindow*)GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 	assert ( window );
 
 	// Get the window wrapper of the script window
@@ -306,7 +306,7 @@ INT_PTR CALLBACK GEItescriptsDlg_WndProc ( HWND hwnd, UINT msg, WPARAM wParam, L
 	switch ( msg )
 	{
 		case WM_INITDIALOG:
-			SetWindowLong ( hwnd, GWL_USERDATA, lParam );
+			SetWindowLongPtr ( hwnd, GWLP_USERDATA, lParam );
 			GEItescriptsDlg_Init ( hwnd );
 
 			gApp.GetOptions().GetWindowPlacement ( "scripts", hwnd );
diff --git a/neo/tools/guied/GENavigator.cpp b/neo/tools/guied/GENavigator.cpp
index 342c25e..7167d1d 100644
--- a/neo/tools/guied/GENavigator.cpp
+++ b/neo/tools/guied/GENavigator.cpp
@@ -139,7 +139,7 @@ Window Procedure
 */
 LRESULT CALLBACK rvGENavigator::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGENavigator* nav = (rvGENavigator*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGENavigator* nav = (rvGENavigator*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -273,16 +273,16 @@ LRESULT CALLBACK rvGENavigator::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LP
 			// Attach the class to the window first
 			cs = (LPCREATESTRUCT) lParam;
 			nav = (rvGENavigator*) cs->lpCreateParams;
-			SetWindowLong ( hWnd, GWL_USERDATA, (LONG)nav );
+			SetWindowLongPtr( hWnd, GWLP_USERDATA, (LONG_PTR)nav );
 
 			// Create the List view
 			nav->mTree = CreateWindowEx ( 0, "SysListView32", "", WS_VSCROLL|WS_CHILD|WS_VISIBLE|LVS_REPORT|LVS_OWNERDRAWFIXED|LVS_NOCOLUMNHEADER|LVS_SHOWSELALWAYS, 0, 0, 0, 0, hWnd, (HMENU)IDC_GUIED_WINDOWTREE, win32.hInstance, 0 );
 			ListView_SetExtendedListViewStyle ( nav->mTree, LVS_EX_FULLROWSELECT );
 			ListView_SetBkColor ( nav->mTree, GetSysColor ( COLOR_3DFACE ) );
 			ListView_SetTextBkColor ( nav->mTree, GetSysColor ( COLOR_3DFACE ) );
-			nav->mListWndProc = (WNDPROC)GetWindowLong ( nav->mTree, GWL_WNDPROC );
-			SetWindowLong ( nav->mTree, GWL_USERDATA, (LONG)nav );
-			SetWindowLong ( nav->mTree, GWL_WNDPROC, (LONG)ListWndProc );
+			nav->mListWndProc = (WNDPROC)GetWindowLongPtr ( nav->mTree, GWLP_WNDPROC );
+			SetWindowLongPtr( nav->mTree, GWLP_USERDATA, (LONG_PTR)nav );
+			SetWindowLongPtr( nav->mTree, GWLP_WNDPROC, (LONG_PTR)ListWndProc );
 
 			// Insert the only column
 			col.mask = 0;
@@ -429,7 +429,7 @@ Window Procedure for the embedded list control
 */
 LRESULT CALLBACK rvGENavigator::ListWndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGENavigator* nav = (rvGENavigator*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGENavigator* nav = (rvGENavigator*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 	assert ( nav );
 
 	switch ( msg )
@@ -472,7 +472,7 @@ void rvGENavigator::AddWindow ( idWindow* window )
 	ZeroMemory ( &item, sizeof(item) );
 	item.mask = LVIF_PARAM|LVIF_STATE|LVIF_IMAGE;
 	item.iItem = ListView_GetItemCount ( mTree );
-	item.lParam = (LONG) window;
+	item.lParam = (LONG_PTR) window;
 	item.iImage = 0;
 	item.state = rvGEWindowWrapper::GetWrapper(window)->IsSelected ()? LVIS_SELECTED:0;
 	item.stateMask = LVIS_SELECTED;
diff --git a/neo/tools/guied/GEProperties.cpp b/neo/tools/guied/GEProperties.cpp
index a157186..c591a55 100644
--- a/neo/tools/guied/GEProperties.cpp
+++ b/neo/tools/guied/GEProperties.cpp
@@ -199,7 +199,7 @@ Window Procedure for the properties window
 */
 LRESULT CALLBACK rvGEProperties::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGEProperties* kv = (rvGEProperties*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGEProperties* kv = (rvGEProperties*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	if ( kv && kv->mGrid.ReflectMessage ( hWnd, msg, wParam, lParam ) )
 	{
@@ -271,7 +271,7 @@ LRESULT CALLBACK rvGEProperties::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, L
 			// Attach the class to the window first
 			cs = (LPCREATESTRUCT) lParam;
 			kv = (rvGEProperties*) cs->lpCreateParams;
-			SetWindowLong ( hWnd, GWL_USERDATA, (LONG)kv );
+			SetWindowLongPtr( hWnd, GWLP_USERDATA, (LONG_PTR)kv );
 
 			kv->mGrid.Create ( hWnd, 999, PGS_ALLOWINSERT );
 
diff --git a/neo/tools/guied/GEPropertyPage.cpp b/neo/tools/guied/GEPropertyPage.cpp
index f937d8e..4264f66 100644
--- a/neo/tools/guied/GEPropertyPage.cpp
+++ b/neo/tools/guied/GEPropertyPage.cpp
@@ -47,7 +47,7 @@ Window procedure for the property page class.
 */
 INT_PTR CALLBACK rvGEPropertyPage::WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGEPropertyPage* page = (rvGEPropertyPage*) GetWindowLong ( hwnd, GWL_USERDATA );
+	rvGEPropertyPage* page = (rvGEPropertyPage*) GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 
 	// Pages dont get the init dialog since their Init method is called instead
 	if ( msg == WM_INITDIALOG )
@@ -56,7 +56,7 @@ INT_PTR CALLBACK rvGEPropertyPage::WndProc ( HWND hwnd, UINT msg, WPARAM wParam,
 
 		page = (rvGEPropertyPage*) psp->lParam;
 
-		SetWindowLong ( hwnd, GWL_USERDATA, (LONG)page );
+		SetWindowLongPtr( hwnd, GWLP_USERDATA, (LONG_PTR)page );
 		page->mPage = hwnd;
 
 		page->Init ( );
@@ -89,7 +89,7 @@ int rvGEPropertyPage::HandleMessage ( UINT msg, WPARAM wParam, LPARAM lParam )
 				case PSN_APPLY:
 					if ( !Apply ( ) )
 					{
-						SetWindowLong ( mPage, DWL_MSGRESULT, PSNRET_INVALID );
+						SetWindowLongPtr ( mPage, DWLP_MSGRESULT, PSNRET_INVALID );
 						return TRUE;
 					}
 					break;
diff --git a/neo/tools/guied/GEStateModifier.cpp b/neo/tools/guied/GEStateModifier.cpp
index 6a65598..c484844 100644
--- a/neo/tools/guied/GEStateModifier.cpp
+++ b/neo/tools/guied/GEStateModifier.cpp
@@ -36,7 +36,9 @@ rvGEStateModifier::rvGEStateModifier ( const char* name, idWindow* window, idDic
 	rvGEModifier ( name, window ),
 	mDict ( dict )
 {
-	mDict.Copy ( dict );
+	//Ross T 1/6/2015 - commented out this mDict copy because it seems completely 
+	//redundant (copy constructor happens two lines above) and was causing a bug with adding keys
+	//mDict.Copy ( dict );
 
 	// Make a copy of the current dictionary
 	mUndoDict.Copy ( mWrapper->GetStateDict() );
diff --git a/neo/tools/guied/GEStatusBar.cpp b/neo/tools/guied/GEStatusBar.cpp
index fefc41f..5a51ccd 100644
--- a/neo/tools/guied/GEStatusBar.cpp
+++ b/neo/tools/guied/GEStatusBar.cpp
@@ -93,7 +93,7 @@ void rvGEStatusBar::Update ( void )
 	{
 		parts[0] = -1;
 
-		SendMessage ( mWnd, SB_SETPARTS, 1, (LONG)parts );
+		SendMessage ( mWnd, SB_SETPARTS, 1, (LONG_PTR)parts );
 		SendMessage ( mWnd, SB_SETTEXT, 1, (LPARAM) "" );
 	}
 	else
@@ -107,7 +107,7 @@ void rvGEStatusBar::Update ( void )
 		parts[3] = parts[2] + 40;
 		parts[4] = -1;
 
-		SendMessage ( mWnd, SB_SETPARTS, 5, (LONG)parts );
+		SendMessage ( mWnd, SB_SETPARTS, 5, (LONG_PTR)parts );
 		SendMessage ( mWnd, SB_SETTEXT, 0, (LPARAM) "" );
 		SendMessage ( mWnd, SB_SETTEXT, 1, (LPARAM) va(" Tris: %d", mTriangles ) );
 		SendMessage ( mWnd, SB_SETTEXT, 2, (LPARAM) va(" Zoom: %d%%", mZoom ) );
diff --git a/neo/tools/guied/GETransformer.cpp b/neo/tools/guied/GETransformer.cpp
index cb93567..03b2781 100644
--- a/neo/tools/guied/GETransformer.cpp
+++ b/neo/tools/guied/GETransformer.cpp
@@ -92,7 +92,7 @@ bool rvGETransformer::Create ( HWND parent, bool visible )
 
 LRESULT CALLBACK rvGETransformer::WndProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGETransformer* trans = (rvGETransformer*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGETransformer* trans = (rvGETransformer*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -117,7 +117,7 @@ LRESULT CALLBACK rvGETransformer::WndProc ( HWND hWnd, UINT msg, WPARAM wParam,
 			// Attach the class to the window first
 			cs = (LPCREATESTRUCT) lParam;
 			trans = (rvGETransformer*) cs->lpCreateParams;
-			SetWindowLong ( hWnd, GWL_USERDATA, (LONG)trans );
+			SetWindowLongPtr( hWnd, GWLP_USERDATA, (LONG_PTR)trans );
 
 			trans->mWnd = hWnd;
 			trans->mDlg = CreateDialogParam ( gApp.GetInstance(), MAKEINTRESOURCE(IDD_GUIED_TRANSFORMER),
@@ -148,7 +148,7 @@ LRESULT CALLBACK rvGETransformer::WndProc ( HWND hWnd, UINT msg, WPARAM wParam,
 
 INT_PTR CALLBACK rvGETransformer::DlgProc ( HWND hWnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGETransformer* trans = (rvGETransformer*) GetWindowLong ( hWnd, GWL_USERDATA );
+	rvGETransformer* trans = (rvGETransformer*) GetWindowLongPtr ( hWnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -163,7 +163,7 @@ INT_PTR CALLBACK rvGETransformer::DlgProc ( HWND hWnd, UINT msg, WPARAM wParam,
 		case WM_INITDIALOG:
 			trans = (rvGETransformer*) lParam;
 			trans->mDlg = hWnd;
-			SetWindowLong ( hWnd, GWL_USERDATA, lParam );
+			SetWindowLongPtr ( hWnd, GWLP_USERDATA, lParam );
 			NumberEdit_Attach ( GetDlgItem ( hWnd, IDC_GUIED_ITEMRECTX ) );
 			NumberEdit_Attach ( GetDlgItem ( hWnd, IDC_GUIED_ITEMRECTY ) );
 			NumberEdit_Attach ( GetDlgItem ( hWnd, IDC_GUIED_ITEMRECTW ) );
diff --git a/neo/tools/guied/GEViewer.cpp b/neo/tools/guied/GEViewer.cpp
index f7c16d7..826631f 100644
--- a/neo/tools/guied/GEViewer.cpp
+++ b/neo/tools/guied/GEViewer.cpp
@@ -229,7 +229,7 @@ static int MapKey (int key)
 
 LRESULT CALLBACK rvGEViewer::WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARAM lParam )
 {
-	rvGEViewer* viewer = (rvGEViewer*) GetWindowLong ( hwnd, GWL_USERDATA );
+	rvGEViewer* viewer = (rvGEViewer*) GetWindowLongPtr ( hwnd, GWLP_USERDATA );
 
 	switch ( msg )
 	{
@@ -365,7 +365,7 @@ LRESULT CALLBACK rvGEViewer::WndProc ( HWND hwnd, UINT msg, WPARAM wParam, LPARA
 		case WM_CREATE:
 		{
 			CREATESTRUCT* cs = (CREATESTRUCT*) lParam;
-			SetWindowLong ( hwnd, GWL_USERDATA, (LONG)cs->lpCreateParams );
+			SetWindowLongPtr( hwnd, GWLP_USERDATA, (LONG_PTR)cs->lpCreateParams );
 
 			viewer = (rvGEViewer*)cs->lpCreateParams;
 			viewer->mWnd = hwnd;
@@ -453,7 +453,7 @@ bool rvGEViewer::SetupPixelFormat ( void )
 	HDC	 hDC    = GetDC ( mWnd );
 	bool result = true;
 
-	int pixelFormat = ChoosePixelFormat(hDC, &win32.pfd);
+	int pixelFormat = Win_ChoosePixelFormat(hDC);
 	if (pixelFormat > 0)
 	{
 		if (SetPixelFormat(hDC, pixelFormat, &win32.pfd) == NULL)
diff --git a/neo/tools/guied/GEWindowWrapper.cpp b/neo/tools/guied/GEWindowWrapper.cpp
index 9f11338..a71aabe 100644
--- a/neo/tools/guied/GEWindowWrapper.cpp
+++ b/neo/tools/guied/GEWindowWrapper.cpp
@@ -68,10 +68,21 @@ rvGEWindowWrapper::rvGEWindowWrapper( idWindow *window,EWindowType type ) {
 	}
 
 	// Attach the wrapper to the window by adding a defined variable
-	// with the wrappers pointer stuffed into an integer
+	// with the wrappers pointer stuffed into (an integer) - actually a string now
+#if 0
 	idWinInt *var = new idWinInt();
 	int x = (int)this;
 	*var = x;
+#else // DG: use idWinStr, because idWinInt can't cold 64bit pointers
+	idWinStr* var = new idWinStr();
+
+	// convert this to hex-string (*without* "0x" prefix)
+	const ULONG_PTR thisULP = (ULONG_PTR)this;
+	char buf[32] = {0};
+	_ui64toa(thisULP, buf, 16);
+
+	var->Set(buf);
+#endif
 	var->SetEval(false);
 	var->SetName("guied_wrapper");
 	mWindow->AddDefinedVar(var);
@@ -87,9 +98,19 @@ Static method that returns the window wrapper for the given window class
 ================
 */
 rvGEWindowWrapper * rvGEWindowWrapper::GetWrapper( idWindow *window ) {
+#if 0
 	idWinInt *var;
 	var = dynamic_cast< idWinInt*>(window->GetWinVarByName("guied_wrapper"));
-	return var ? ((rvGEWindowWrapper *) (int) (*var)) : NULL;
+	return var ? ((rvGEWindowWrapper *)(int) (*var)) : NULL;
+#else
+	// DG: use idWinStr, because idWinInt can't cold 64bit pointers
+	idWinStr* var = (idWinStr*)window->GetWinVarByName("guied_wrapper");
+	if(var == NULL)
+		return NULL;
+
+	ULONG_PTR thisULP = (ULONG_PTR)_strtoui64(var->c_str(), NULL, 16);
+	return (rvGEWindowWrapper *)thisULP;
+#endif
 }
 
 /*
diff --git a/neo/tools/guied/GEWorkspace.cpp b/neo/tools/guied/GEWorkspace.cpp
index 8c98427..c5bcdac 100644
--- a/neo/tools/guied/GEWorkspace.cpp
+++ b/neo/tools/guied/GEWorkspace.cpp
@@ -144,7 +144,7 @@ bool rvGEWorkspace::Attach ( HWND wnd )
 
 	// Jam the workspace pointer into the userdata window long so
 	// we can retrieve the workspace from the window later
-	SetWindowLong ( mWnd, GWL_USERDATA, (LONG) this );
+	SetWindowLongPtr( mWnd, GWLP_USERDATA, ( LONG_PTR ) this );
 
 	UpdateTitle ( );
 
@@ -162,7 +162,7 @@ void rvGEWorkspace::Detach ( void )
 {
 	assert ( mWnd );
 
-	SetWindowLong ( mWnd, GWL_USERDATA, 0 );
+	SetWindowLongPtr ( mWnd, GWLP_USERDATA, 0 );
 	mWnd = NULL;
 }
 
@@ -178,7 +178,7 @@ bool rvGEWorkspace::SetupPixelFormat ( void )
 	HDC	 hDC    = GetDC ( mWnd );
 	bool result = true;
 
-	int pixelFormat = ChoosePixelFormat(hDC, &win32.pfd);
+	int pixelFormat = Win_ChoosePixelFormat(hDC);
 	if (pixelFormat > 0)
 	{
 		if (SetPixelFormat(hDC, pixelFormat, &win32.pfd) == NULL)
@@ -580,31 +580,31 @@ void rvGEWorkspace::UpdateCursor ( rvGESelectionMgr::EHitTest type )
 	switch ( type )
 	{
 		case rvGESelectionMgr::HT_SELECT:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 			break;
 
 		case rvGESelectionMgr::HT_MOVE:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_SIZEALL) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_SIZEALL ) );
 			break;
 
 		case rvGESelectionMgr::HT_SIZE_LEFT:
 		case rvGESelectionMgr::HT_SIZE_RIGHT:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_SIZEWE ) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_SIZEWE ) );
 			break;
 
 		case rvGESelectionMgr::HT_SIZE_TOP:
 		case rvGESelectionMgr::HT_SIZE_BOTTOM:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_SIZENS ) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_SIZENS ) );
 			break;
 
 		case rvGESelectionMgr::HT_SIZE_TOPRIGHT:
 		case rvGESelectionMgr::HT_SIZE_BOTTOMLEFT:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_SIZENESW ) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_SIZENESW ) );
 			break;
 
 		case rvGESelectionMgr::HT_SIZE_BOTTOMRIGHT:
 		case rvGESelectionMgr::HT_SIZE_TOPLEFT:
-			SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_SIZENWSE ) ) );
+			SetCursor ( LoadCursor ( NULL, IDC_SIZENWSE ) );
 			break;
 	}
 }
@@ -627,7 +627,7 @@ void rvGEWorkspace::UpdateCursor ( float x, float y )
 	}
 	else
 	{
-		SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW ) ) );
+		SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 	}
 }
 
@@ -887,7 +887,7 @@ int	rvGEWorkspace::HandleRButtonDown ( WPARAM wParam, LPARAM lParam )
 		AppendMenu ( popup, MF_STRING|MF_ENABLED|(wrapper->IsSelected()?MF_CHECKED:0), ID_GUIED_SELECT_FIRST + i, mSelectMenu[i]->GetName() );
 	}
 
-	InsertMenu ( menu, 1, MF_POPUP|MF_BYPOSITION, (LONG) popup, "Select" );
+	InsertMenu ( menu, 1, MF_POPUP|MF_BYPOSITION, (UINT_PTR) popup, "Select" );
 
 	// Bring up the popup menu
 	ClientToScreen ( mWnd, &point );
diff --git a/neo/tools/guied/GEWorkspace.h b/neo/tools/guied/GEWorkspace.h
index 1700272..068d87b 100644
--- a/neo/tools/guied/GEWorkspace.h
+++ b/neo/tools/guied/GEWorkspace.h
@@ -267,7 +267,7 @@ ID_INLINE rvGEWorkspace::EZoomLevel rvGEWorkspace::GetZoom ( void )
 
 ID_INLINE rvGEWorkspace* rvGEWorkspace::GetWorkspace ( HWND wnd )
 {
-	return (rvGEWorkspace*) GetWindowLong ( wnd, GWL_USERDATA );
+	return (rvGEWorkspace*) GetWindowLongPtr ( wnd, GWLP_USERDATA );
 }
 
 ID_INLINE const char* rvGEWorkspace::GetFilename ( void )
diff --git a/neo/tools/guied/GEWorkspaceFile.cpp b/neo/tools/guied/GEWorkspaceFile.cpp
index de1e37a..ef68fe6 100644
--- a/neo/tools/guied/GEWorkspaceFile.cpp
+++ b/neo/tools/guied/GEWorkspaceFile.cpp
@@ -45,7 +45,7 @@ bool rvGEWorkspace::SaveFile ( const char* filename )
 	idFile*		file;
 	idWindow*	window;
 
-	SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_WAIT ) ) );
+	SetCursor ( LoadCursor ( NULL, IDC_WAIT ) );
 
 	mFilename = filename;
 
@@ -58,10 +58,10 @@ bool rvGEWorkspace::SaveFile ( const char* filename )
 	ospath = fileSystem->RelativePathToOSPath ( tempfile, "fs_savepath" );
 
 	// Open the output file for write
-	if ( !(file = fileSystem->OpenFileWrite ( tempfile ) ) )
+	file = fileSystem->OpenFileWrite(tempfile);
+	if ( !file )
 	{
-		int error = GetLastError ( );
-		SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW ) ) );
+		SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 		return false;
 	}
 
@@ -74,7 +74,7 @@ bool rvGEWorkspace::SaveFile ( const char* filename )
 	if ( !CopyFile ( ospath, filename, FALSE ) )
 	{
 		DeleteFile ( ospath );
-		SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW ) ) );
+		SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 		return false;
 	}
 
@@ -85,7 +85,7 @@ bool rvGEWorkspace::SaveFile ( const char* filename )
 	mNew      = false;
 	UpdateTitle ( );
 
-	SetCursor ( LoadCursor ( NULL, MAKEINTRESOURCE(IDC_ARROW ) ) );
+	SetCursor ( LoadCursor ( NULL, IDC_ARROW ) );
 
 	return true;
 }
diff --git a/neo/tools/materialeditor/MEMainFrame.cpp b/neo/tools/materialeditor/MEMainFrame.cpp
index a8f4a6a..da41c73 100644
--- a/neo/tools/materialeditor/MEMainFrame.cpp
+++ b/neo/tools/materialeditor/MEMainFrame.cpp
@@ -331,8 +331,12 @@ void MEMainFrame::OnDestroy() {
 */
 void MEMainFrame::OnSize(UINT nType, int cx, int cy)
 {
+
 	CFrameWnd::OnSize(nType, cx, cy);
 
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s5 = int(5 * scaling_factor);
+
 	CRect statusRect;
 	m_wndStatusBar.GetWindowRect(statusRect);
 
@@ -342,7 +346,7 @@ void MEMainFrame::OnSize(UINT nType, int cx, int cy)
 	CRect tabRect;
 	m_tabs.GetItemRect(0, tabRect);
 
-	int tabHeight = tabRect.Height()+5;
+	int tabHeight = tabRect.Height()+ s5;
 
 	m_splitterWnd.MoveWindow(0, toolbarRect.Height(), cx, cy-statusRect.Height()-toolbarRect.Height()-tabHeight);
 
diff --git a/neo/tools/materialeditor/MaterialDoc.cpp b/neo/tools/materialeditor/MaterialDoc.cpp
index 479c902..aca567e 100644
--- a/neo/tools/materialeditor/MaterialDoc.cpp
+++ b/neo/tools/materialeditor/MaterialDoc.cpp
@@ -768,7 +768,6 @@ void MaterialDoc::ParseStage(idLexer* src) {
 */
 void MaterialDoc::AddSpecialMapStage(const char* stageName, const char* map) {
 	MEStage_t* newStage = new MEStage_t();
-	int index = editMaterial.stages.Append(newStage);
 	newStage->stageData.Set("name", stageName);
 	newStage->stageData.Set("map", map);
 	newStage->stageData.SetInt("stagetype", STAGE_TYPE_SPECIALMAP);
diff --git a/neo/tools/materialeditor/MaterialEditView.cpp b/neo/tools/materialeditor/MaterialEditView.cpp
index e830730..15ff659 100644
--- a/neo/tools/materialeditor/MaterialEditView.cpp
+++ b/neo/tools/materialeditor/MaterialEditView.cpp
@@ -240,11 +240,22 @@ void MaterialEditView::OnSize(UINT nType, int cx, int cy) {
 	CRect tabRect;
 	m_tabs.GetItemRect(0, tabRect);
 
-	int tabHeight = tabRect.Height()+5;
+
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s2 = int(2 * scaling_factor);
+	int s8 = int(8 * scaling_factor);
+	int s4 = int(4 * scaling_factor);
+	int s5 = int(5 * scaling_factor);
+	int s6 = int(6 * scaling_factor);
+	int s12 = int(12 * scaling_factor);
+	int s16 = int(16 * scaling_factor);
+	int s22 = int(22 * scaling_factor);
+
+	int tabHeight = tabRect.Height()+s5;
 
 	//Hardcode the edit window height
 	if(m_nameEdit.GetSafeHwnd()) {
-		m_nameEdit.MoveWindow(1,1, cx-2, 20);
+		m_nameEdit.MoveWindow(1,1, cx-s2, 20);
 	}
 
 	if(m_tabs.GetSafeHwnd()) {
@@ -252,11 +263,11 @@ void MaterialEditView::OnSize(UINT nType, int cx, int cy) {
 	}
 
 	if(m_editSplitter.GetSafeHwnd()) {
-		m_editSplitter.MoveWindow(1, 22, cx-2, cy-tabHeight-22);
+		m_editSplitter.MoveWindow(1, 22, cx-s2, cy-tabHeight-s22);
 	}
 
 	if(m_textView.GetSafeHwnd()) {
-		m_textView.MoveWindow(1, 22, cx-2, cy-tabHeight-22);
+		m_textView.MoveWindow(1, 22, cx-s2, cy-tabHeight-s22);
 	}
 }
 
diff --git a/neo/tools/materialeditor/MaterialEditor.cpp b/neo/tools/materialeditor/MaterialEditor.cpp
index 3065aca..523927f 100644
--- a/neo/tools/materialeditor/MaterialEditor.cpp
+++ b/neo/tools/materialeditor/MaterialEditor.cpp
@@ -1,3 +1,4 @@
+
 /*
 ===========================================================================
 
@@ -50,7 +51,6 @@ void MaterialEditorInit( void ) {
 
 	com_editors = EDITOR_MATERIAL;
 
-	Sys_GrabMouseCursor( false );
 
 	InitAfx();
 
@@ -131,7 +131,8 @@ void MaterialEditorShutdown( void ) {
 * Allows the doom engine to reflect console output to the material editors console.
 */
 void MaterialEditorPrintConsole( const char *msg ) {
-	if(com_editors & EDITOR_MATERIAL)
+	//meMainFrame can be null when starting immedeatly from commandline.
+	if(meMainFrame && com_editors & EDITOR_MATERIAL)
 		meMainFrame->PrintConsoleMessage(msg);
 }
 
diff --git a/neo/tools/materialeditor/MaterialTreeView.cpp b/neo/tools/materialeditor/MaterialTreeView.cpp
index 2a204e8..69f93f6 100644
--- a/neo/tools/materialeditor/MaterialTreeView.cpp
+++ b/neo/tools/materialeditor/MaterialTreeView.cpp
@@ -119,8 +119,6 @@ void MaterialTreeView::InitializeMaterialList(bool includeFile, const char* file
 */
 void MaterialTreeView::BuildMaterialList(bool includeFile, const char* filename) {
 
-	CTreeCtrl& tree = GetTreeCtrl();
-
 	idStrList list(1024);
 
 	int count = declManager->GetNumDecls( DECL_MATERIAL );
@@ -140,10 +138,6 @@ void MaterialTreeView::BuildMaterialList(bool includeFile, const char* filename)
 				continue;
 			}
 
-			if(filename.Find("def") != -1) {
-				int x = 0;
-			}
-
 			if(includeFile) {
 				filename.StripPath();
 				temp = idStr(mat->GetFileName()) + "/" + idStr(mat->GetName()) + "|" + filename;
@@ -1807,7 +1801,7 @@ void MaterialTreeView::SetItemImage(HTREEITEM item, bool mod, bool apply, bool c
 
 	CTreeCtrl& tree = GetTreeCtrl();
 
-	int image;
+	int image = 0;
 
 	DWORD itemType = tree.GetItemData(item);
 	switch(itemType) {
diff --git a/neo/tools/materialeditor/StageView.cpp b/neo/tools/materialeditor/StageView.cpp
index eb58fb5..cca42af 100644
--- a/neo/tools/materialeditor/StageView.cpp
+++ b/neo/tools/materialeditor/StageView.cpp
@@ -365,8 +365,6 @@ void StageView::OnLvnItemchanged(NMHDR *pNMHDR, LRESULT *pResult) {
 * Notifies the property view that all stages have been removed.
 */
 void StageView::OnLvnDeleteallitems(NMHDR *pNMHDR, LRESULT *pResult) {
-	LPNMLISTVIEW pNMLV = reinterpret_cast<LPNMLISTVIEW>(pNMHDR);
-
 	//The list has been cleared so clear the prop view
 	m_propView->SetPropertyListType(-1);
 
diff --git a/neo/tools/materialeditor/ToggleListView.cpp b/neo/tools/materialeditor/ToggleListView.cpp
index 5f718a3..2b52954 100644
--- a/neo/tools/materialeditor/ToggleListView.cpp
+++ b/neo/tools/materialeditor/ToggleListView.cpp
@@ -160,7 +160,8 @@ void ToggleListView::OnSize(UINT nType, int cx, int cy) {
 * Returns the size of each item in the toggle list.
 */
 void ToggleListView::MeasureItem(LPMEASUREITEMSTRUCT lpMeasureItemStruct) {
-	lpMeasureItemStruct->itemHeight = TOGGLELIST_ITEMHEIGHT;
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	lpMeasureItemStruct->itemHeight = TOGGLELIST_ITEMHEIGHT * scaling_factor;
 }
 
 /**
diff --git a/neo/tools/pda/DialogPDAEditor.cpp b/neo/tools/pda/DialogPDAEditor.cpp
index a2061e6..3599d4e 100644
--- a/neo/tools/pda/DialogPDAEditor.cpp
+++ b/neo/tools/pda/DialogPDAEditor.cpp
@@ -125,12 +125,14 @@ void PDAEditorInit( const idDict *spawnArgs ) {
 	g_PDAEditorDialog->ShowWindow( SW_SHOW );
 	g_PDAEditorDialog->SetFocus();
 
+#if 0
 	if ( spawnArgs ) {
 		// select PDA based on spawn args
 		const char *name = spawnArgs->GetString( "pda" );
 		idDeclPDA *decl = static_cast<idDeclPDA *>( const_cast<idDecl *>( declManager->FindType( DECL_PDA, name ) ) );
 		// FIXME: select this PDA
 	}
+#endif
 }
 
 void PDAEditorRun( void ) {
diff --git a/neo/tools/radiant/CamWnd.cpp b/neo/tools/radiant/CamWnd.cpp
index 835944d..1e073c3 100644
--- a/neo/tools/radiant/CamWnd.cpp
+++ b/neo/tools/radiant/CamWnd.cpp
@@ -126,7 +126,7 @@ END_MESSAGE_MAP()
  =======================================================================================================================
  =======================================================================================================================
  */
-LONG WINAPI CamWndProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) {
+LONG_PTR WINAPI CamWndProc(HWND hWnd, UINT uMsg, WPARAM wParam, LPARAM lParam) {
 	RECT	rect;
 
 	GetClientRect(hWnd, &rect);
@@ -199,7 +199,6 @@ brush_t *g_pSplitList = NULL;
  */
 void CCamWnd::OnPaint() {
 	CPaintDC	dc(this);	// device context for painting
-	bool		bPaint = true;
 
 	if (!qwglMakeCurrent(dc.m_hDC, win32.hGLRC)) {
 		common->Printf("ERROR: wglMakeCurrent failed..\n ");
@@ -361,7 +360,7 @@ int CCamWnd::OnCreate(LPCREATESTRUCT lpCreateStruct) {
 	if( qwglMakeCurrent ( hDC, win32.hGLRC ) == FALSE ) {
 		common->Warning("wglMakeCurrent failed: %d", ::GetLastError());
 		if ( r_multiSamples.GetInteger() > 0 ) {
-			common->Warning("\n!!! Remember to set r_multiSamples 0 when using the editor !!!\n");
+			common->Warning("\n!!! Try setting r_multiSamples 0 when using the editor !!!\n");
 		}
 	}
 
@@ -762,7 +761,7 @@ bool CCamWnd::CullBrush(brush_t *b, bool cubicOnly) {
 		float distance = g_PrefsDlg.m_nCubicScale * 64;
 
 		idVec3 mid;
-		for (int i = 0; i < 3; i++) {
+		for (i = 0; i < 3; i++) {
 			mid[i] = (b->mins[i] + ((b->maxs[i] - b->mins[i]) / 2));
 		}
 
@@ -1065,7 +1064,7 @@ void CCamWnd::Cam_Draw() {
 	int nCount = g_ptrSelectedFaces.GetSize();
 
 	if (!renderMode) {
-		for (int i = 0; i < nCount; i++) {
+		for (i = 0; i < nCount; i++) {
 			face_t	*selFace = reinterpret_cast < face_t * > (g_ptrSelectedFaces.GetAt(i));
 			Face_Draw(selFace);
 			DrawAxial(selFace);
@@ -1079,7 +1078,7 @@ void CCamWnd::Cam_Draw() {
 
 	if (renderMode) {
 		qglColor3f(1, 0, 0);
-		for (int i = 0; i < nCount; i++) {
+		for (i = 0; i < nCount; i++) {
 			face_t	*selFace = reinterpret_cast < face_t * > (g_ptrSelectedFaces.GetAt(i));
 			Face_Draw(selFace);
 		}
@@ -1394,7 +1393,6 @@ void Tris_ToOBJ(const char *outFile, idTriList *tris, idMatList *mats) {
 		int i, j, k;
 		int indexBase = 1;
 		idStr lastMaterial("");
-		int matCount = 0;
 		//idStr basePath = cvarSystem->GetCVarString( "fs_savepath" );
 		f->Printf( "mtllib %s.mtl\n", out );
 		for (i = 0; i < tris->Num(); i++) {
@@ -1429,7 +1427,7 @@ void Tris_ToOBJ(const char *outFile, idTriList *tris, idMatList *mats) {
 				}
 			}
 
-			for (int j = 0; j < tri->numIndexes; j += 3) {
+			for (j = 0; j < tri->numIndexes; j += 3) {
 				int i1, i2, i3;
 				i1 = tri->indexes[j+2] + indexBase;
 				i2 = tri->indexes[j+1] + indexBase;
@@ -1461,7 +1459,7 @@ int Brush_TransformModel(brush_t *brush, idTriList *tris, idMatList *mats) {
 		idRenderModel *model = brush->modelHandle;
 		if (model) {
 			float	a = FloatForKey(brush->owner, "angle");
-			float	s, c;
+			float	s = 0.0f, c = 0.0f;
 			//FIXME: support full rotation matrix
 			bool matrix = false;
 			if (a) {
@@ -1669,7 +1667,11 @@ void Select_ToOBJ() {
 }
 
 void Select_ToCM() {
-	CFileDialog dlgFile( FALSE, "lwo, ase", NULL, 0, "(*.lwo)|*.lwo|(*.ase)|*.ase|(*.ma)|*.ma||", g_pParentWnd );
+#if USE_COLLADA
+	CFileDialog dlgFile( FALSE, "lwo, ase, dae", NULL, 0, "(*.lwo)|*.lwo|(*.ase)|*.ase|(*.ma)|*.ma|(*.dae)|*.dae||", g_pParentWnd );
+#else
+	CFileDialog dlgFile(FALSE, "lwo, ase", NULL, 0, "(*.lwo)|*.lwo|(*.ase)|*.ase|(*.ma)|*.ma||", g_pParentWnd);
+#endif
 
 	if ( dlgFile.DoModal() == IDOK ) {
 		idMapEntity *mapEnt;
@@ -2129,7 +2131,7 @@ void CCamWnd::Cam_Render() {
 }
 
 
-void CCamWnd::OnTimer(UINT nIDEvent)
+void CCamWnd::OnTimer(UINT_PTR nIDEvent)
 {
 	if (animationMode || nIDEvent == 1) {
 		Sys_UpdateWindows(W_CAMERA);
diff --git a/neo/tools/radiant/CamWnd.h b/neo/tools/radiant/CamWnd.h
index 2bbe1bd..0291ec5 100644
--- a/neo/tools/radiant/CamWnd.h
+++ b/neo/tools/radiant/CamWnd.h
@@ -192,7 +192,7 @@ protected:
 	afx_msg int OnCreate(LPCREATESTRUCT lpCreateStruct);
 	afx_msg void OnSize(UINT nType, int cx, int cy);
 	afx_msg void OnKeyUp(UINT nChar, UINT nRepCnt, UINT nFlags);
-	afx_msg void OnTimer(UINT nIDEvent);
+	afx_msg void OnTimer(UINT_PTR nIDEvent);
 	//}}AFX_MSG
 	DECLARE_MESSAGE_MAP()
 };
diff --git a/neo/tools/radiant/CapDialog.h b/neo/tools/radiant/CapDialog.h
index e78451e..4e552ba 100644
--- a/neo/tools/radiant/CapDialog.h
+++ b/neo/tools/radiant/CapDialog.h
@@ -41,10 +41,10 @@ class CCapDialog : public CDialog
 {
 // Construction
 public:
-  static enum {BEVEL = 0, ENDCAP, IBEVEL, IENDCAP};
+	enum {BEVEL = 0, ENDCAP, IBEVEL, IENDCAP};
 	CCapDialog(CWnd* pParent = NULL);   // standard constructor
 
-  int getCapType() {return m_nCap;};
+	int getCapType() {return m_nCap;};
 // Dialog Data
 	//{{AFX_DATA(CCapDialog)
 	enum { IDD = IDD_DIALOG_CAP };
diff --git a/neo/tools/radiant/DRAG.CPP b/neo/tools/radiant/DRAG.CPP
index a0a5bdf..48d10f1 100644
--- a/neo/tools/radiant/DRAG.CPP
+++ b/neo/tools/radiant/DRAG.CPP
@@ -556,7 +556,7 @@ static void MoveSelection( const idVec3 &orgMove ) {
 	if (g_pParentWnd->ActiveXY()->ScaleMode()) {
 		idVec3	v;
 		v[0] = v[1] = v[2] = 1.0f;
-		for (int i = 0; i < 3; i++) {
+		for (i = 0; i < 3; i++) {
 			if ( move[i] > 0.0f ) {
 				v[i] = 1.1f;
 			} else if ( move[i] < 0.0f ) {
diff --git a/neo/tools/radiant/DialogTextures.cpp b/neo/tools/radiant/DialogTextures.cpp
index 4714106..a2b5ec8 100644
--- a/neo/tools/radiant/DialogTextures.cpp
+++ b/neo/tools/radiant/DialogTextures.cpp
@@ -501,7 +501,6 @@ void CDialogTextures::OnClickTreeTextures(NMHDR *pNMHDR, LRESULT *pResult) {
  =======================================================================================================================
  */
 void CDialogTextures::OnSelchangedTreeTextures(NMHDR *pNMHDR, LRESULT *pResult) {
-	NM_TREEVIEW *pNMTreeView = (NM_TREEVIEW *) pNMHDR;
 	*pResult = 0;
 
 	editMaterial = NULL;
@@ -827,9 +826,11 @@ void CDialogTextures::addStrList( const char *root, const idStrList &list, int i
  */
 void CDialogTextures::addModels(bool rootItems) {
 	idFileList *files;
-
-	files = fileSystem->ListFilesTree( "models", ".ase|.lwo|.ma", true );
-
+#if USE_COLLADA
+	files = fileSystem->ListFilesTree( "models", ".ase|.lwo|.ma|.dae", true );
+#else
+	files = fileSystem->ListFilesTree("models", ".ase|.lwo|.ma", true);
+#endif
 	if ( files->GetNumFiles() ) {
 		addStrList( TypeNames[MODELS], files->GetList(), MODELS );
 	}
@@ -937,27 +938,32 @@ void CDialogTextures::OnSize(UINT nType, int cx, int cy)
 		return;
 	}
 
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s8 = int(8 * scaling_factor);
+	int s4 = int(4 * scaling_factor);
+	int s12 = int(12 * scaling_factor);
+
 	CRect rect, rect2, rect3;
 	GetClientRect(rect);
 	m_btnLoad.GetWindowRect(rect2);
 
-	m_btnLoad.SetWindowPos(NULL, rect.left + 4, rect.top + 4, 0, 0, SWP_NOSIZE | SWP_SHOWWINDOW);
-	m_btnRefresh.SetWindowPos(NULL, rect.left + rect2.Width() + 4, rect.top + 4, 0, 0, SWP_NOSIZE | SWP_SHOWWINDOW);
+	m_btnLoad.SetWindowPos(NULL, rect.left + s4, rect.top + s4, 0, 0, SWP_NOSIZE | SWP_SHOWWINDOW);
+	m_btnRefresh.SetWindowPos(NULL, rect.left + rect2.Width() + s4, rect.top + 4, 0, 0, SWP_NOSIZE | SWP_SHOWWINDOW);
 
 
-	int right = rect.right - 4 - rect3.Width() - 4;
+	int right = rect.right - s4 - rect3.Width() - s4;
 
 
-	right = rect3.right - 4 - rect3.Width() - 4;
+	right = rect3.right - s4 - rect3.Width() - s4;
 
 	m_chkHideRoot.GetWindowRect(rect3);
-	m_chkHideRoot.SetWindowPos(NULL, right - rect3.Width() * 2, rect.top + 4, 0, 0, SWP_NOSIZE | SWP_SHOWWINDOW);
+	m_chkHideRoot.SetWindowPos(NULL, right - rect3.Width() * 2, rect.top + s4, 0, 0, SWP_NOSIZE | SWP_SHOWWINDOW);
 	m_chkHideRoot.ShowWindow(SW_HIDE);
 
-	int verticalSpace = (rect.Height() - rect2.Height() - 12) / 2;
+	int verticalSpace = (rect.Height() - rect2.Height() - s12) / 2;
 
-	m_treeTextures.SetWindowPos(NULL, rect.left + 4, rect.top + 8 + rect2.Height(), (rect.Width() - 8), verticalSpace, SWP_SHOWWINDOW);
-	m_wndPreview.SetWindowPos(NULL, rect.left + 4, rect.top + 12 + rect2.Height() + verticalSpace, (rect.Width() - 8), verticalSpace, SWP_SHOWWINDOW);
+	m_treeTextures.SetWindowPos(NULL, rect.left + s4, rect.top + s8 + rect2.Height(), (rect.Width() - s8), verticalSpace, SWP_SHOWWINDOW);
+	m_wndPreview.SetWindowPos(NULL, rect.left + s4, rect.top + s12 + rect2.Height() + verticalSpace, (rect.Width() - s8), verticalSpace, SWP_SHOWWINDOW);
 
 	RedrawWindow();
 }
diff --git a/neo/tools/radiant/DlgCamera.cpp b/neo/tools/radiant/DlgCamera.cpp
index e26f65e..eecac84 100644
--- a/neo/tools/radiant/DlgCamera.cpp
+++ b/neo/tools/radiant/DlgCamera.cpp
@@ -186,7 +186,7 @@ void CDlgCamera::OnSelchangeComboSplines()
 
 void CDlgCamera::OnSelchangeListEvents()
 {
-	int sel = m_wndEvents.GetCurSel();
+	//int sel = m_wndEvents.GetCurSel();
 	//g_splineList->setActiveSegment(sel >= 0 ? sel : 0);
 }
 
diff --git a/neo/tools/radiant/EditViewDlg.cpp b/neo/tools/radiant/EditViewDlg.cpp
index 2fe4d59..dd5bc0a 100644
--- a/neo/tools/radiant/EditViewDlg.cpp
+++ b/neo/tools/radiant/EditViewDlg.cpp
@@ -69,9 +69,19 @@ END_MESSAGE_MAP()
 
 void CEditViewDlg::OnSize(UINT nType, int cx, int cy) {
 	CDialog::OnSize(nType, cx, cy);
+
 	if (GetSafeHwnd() == NULL) {
 		return;
 	}
+
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s2 = int(2 * scaling_factor);
+	int s8 = int(8 * scaling_factor);
+	int s4 = int(4 * scaling_factor);
+	int s6 = int(6 * scaling_factor);
+	int s12 = int(12 * scaling_factor);
+	int s16 = int(16 * scaling_factor);
+
 	CRect rect, crect;
 	GetClientRect(rect);
 	CWnd *wnd = GetDlgItem(IDC_BUTTON_OPEN);
@@ -212,7 +222,7 @@ void CEditViewDlg::SetGuiInfo(const char *name) {
 	UpdateEditPreview();
 }
 
-void CEditViewDlg::OnTimer(UINT nIDEvent) {
+void CEditViewDlg::OnTimer(UINT_PTR nIDEvent) {
 	CDialog::OnTimer(nIDEvent);
 	CWnd *wnd = GetDlgItem(IDC_EDIT_LINE);
 	if (wnd) {
diff --git a/neo/tools/radiant/EditViewDlg.h b/neo/tools/radiant/EditViewDlg.h
index bcb3122..e66f541 100644
--- a/neo/tools/radiant/EditViewDlg.h
+++ b/neo/tools/radiant/EditViewDlg.h
@@ -74,7 +74,7 @@ public:
 	afx_msg void OnBnClickedButtonSave();
 	virtual BOOL OnInitDialog();
 	afx_msg void OnDestroy();
-	afx_msg void OnTimer(UINT nIDEvent);
+	afx_msg void OnTimer(UINT_PTR nIDEvent);
 	afx_msg void OnBnClickedButtonGoto();
 	virtual BOOL PreTranslateMessage(MSG* pMsg);
 	afx_msg LRESULT OnFindDialogMessage(WPARAM wParam, LPARAM lParam);
diff --git a/neo/tools/radiant/EditorBrush.cpp b/neo/tools/radiant/EditorBrush.cpp
index 092eb26..389c104 100644
--- a/neo/tools/radiant/EditorBrush.cpp
+++ b/neo/tools/radiant/EditorBrush.cpp
@@ -1518,10 +1518,6 @@ brush_t *Brush_Parse(idVec3 origin) {
 		// read the texturedef
 		GetToken(false);
 		f->texdef.SetName(token);
-		if (token[0] == '(') {
-			int i = 32;
-		}
-
 		GetToken(false);
 		f->texdef.shift[0] = atoi(token);
 		GetToken(false);
@@ -2063,6 +2059,7 @@ brush_t *Brush_CreatePyramid(idVec3 mins, idVec3 maxs, texdef_t *texdef) {
 	// ++timo handle new brush primitive ? return here ??
 	return Brush_Create(mins, maxs, texdef);
 
+#if 0
 	int i;
 	for (i = 0; i < 3; i++) {
 		if (maxs[i] < mins[i]) {
@@ -2125,6 +2122,7 @@ brush_t *Brush_CreatePyramid(idVec3 mins, idVec3 maxs, texdef_t *texdef) {
 	}
 
 	return b;
+#endif
 }
 
 /*
@@ -2135,7 +2133,7 @@ Brush_MakeSided
 ================
 */
 void Brush_MakeSided(int sides) {
-	int			i, axis;
+	int			i, axis = 0;
 	idVec3		mins, maxs;
 	brush_t		*b;
 	texdef_t	*texdef;
@@ -2537,7 +2535,7 @@ bool Brush_ModelIntersect(brush_t *b, idVec3 origin, idVec3 dir,float &scale) {
 
 		bool matrix = false;
 		idMat3 mat;
-		float a, s, c;
+		float a = 0.0f, s = 0.0f, c = 0.0f;
 		if (GetMatrixForKey(b->owner, "rotation", mat)) {
 			matrix = true;
 		} else {
@@ -3144,7 +3142,6 @@ void Brush_UpdateLightPoints(brush_t *b, const idVec3 &offset) {
 		if (GetVectorForKey(b->owner, "light_center", vCenter)) {
 
 			if (offset.x || offset.y || offset.z) {
-				CString str;
 				VectorAdd(vCenter, offset, vCenter);
 				SetKeyVec3(b->owner, "light_center", vCenter);
 			}
@@ -4356,7 +4353,6 @@ void Brush_DrawCombatNode( brush_t *b, bool cameraView, bool bSelected ) {
 	idVec3 cone_left = leftang.ToForward();
 	idAngles rightang( 0.0f, yaw - fov * 0.5f + 90.0f, 0.0f );
 	idVec3 cone_right = rightang.ToForward();
-	bool disabled = b->owner->epairs.GetBool( "start_off" );
 
 	idVec4 color;
 	if ( bSelected ) {
@@ -4610,7 +4606,7 @@ void Brush_DrawCurve( brush_t *b, bool bSelected, bool cam ) {
 	}
 
 	int maxage = b->owner->curve->GetNumValues();
-	int i, time = 0;
+	int i;
 	qglColor3f( 0.0f, 0.0f, 1.0f );
 	for ( i = 0; i < maxage; i++) {
 
diff --git a/neo/tools/radiant/EditorEntity.cpp b/neo/tools/radiant/EditorEntity.cpp
index 7d6e492..be88b4c 100644
--- a/neo/tools/radiant/EditorEntity.cpp
+++ b/neo/tools/radiant/EditorEntity.cpp
@@ -734,28 +734,28 @@ entity_t *Entity_PostParse(entity_t *ent, brush_t *pList) {
 		}
 
 		if (needsOrigin) {
-			idVec3	mins, maxs, mid;
+			idVec3	entmins, entmaxs, mid;
 			int		i;
 			char	text[32];
-			mins[0] = mins[1] = mins[2] = 999999;
-			maxs[0] = maxs[1] = maxs[2] = -999999;
+			entmins[0] = entmins[1] = entmins[2] = 999999;
+			entmaxs[0] = entmaxs[1] = entmaxs[2] = -999999;
 
 			// add in the origin
 			for (b = ent->brushes.onext; b != &ent->brushes; b = b->onext) {
 				Brush_Build(b, true, false, false);
 				for (i = 0; i < 3; i++) {
-					if (b->mins[i] < mins[i]) {
-						mins[i] = b->mins[i];
+					if (b->mins[i] < entmins[i]) {
+						entmins[i] = b->mins[i];
 					}
 
-					if (b->maxs[i] > maxs[i]) {
-						maxs[i] = b->maxs[i];
+					if (b->maxs[i] > entmaxs[i]) {
+						entmaxs[i] = b->maxs[i];
 					}
 				}
 			}
 
 			for (i = 0; i < 3; i++) {
-				ent->origin[i] = (mins[i] + ((maxs[i] - mins[i]) / 2));
+				ent->origin[i] = (entmins[i] + ((entmaxs[i] - entmins[i]) / 2));
 			}
 
 			sprintf(text, "%i %i %i", (int)ent->origin[0], (int)ent->origin[1], (int)ent->origin[2]);
@@ -809,7 +809,6 @@ entity_t *Entity_Parse(bool onlypairs, brush_t *pList) {
 	ent->brushes.onext = ent->brushes.oprev = &ent->brushes;
 	ent->origin.Zero();
 
-	int n = 0;
 	do {
 		if (!GetToken(true)) {
 			Warning("ParseEntity: EOF without closing brace");
diff --git a/neo/tools/radiant/EditorMap.cpp b/neo/tools/radiant/EditorMap.cpp
index aaa34f3..63946f6 100644
--- a/neo/tools/radiant/EditorMap.cpp
+++ b/neo/tools/radiant/EditorMap.cpp
@@ -51,14 +51,6 @@ entity_t	*world_entity = NULL;	// "classname" "worldspawn" !
 void		AddRegionBrushes(void);
 void		RemoveRegionBrushes(void);
 
-/*
- =======================================================================================================================
- =======================================================================================================================
- */
-void DupLists() {
-	DWORD	dw = GetTickCount();
-}
-
 /*
  * Cross map selection saving this could mess this up if you have only part of a
  * complex entity selected...
diff --git a/neo/tools/radiant/EntityDlg.cpp b/neo/tools/radiant/EntityDlg.cpp
index 618c1de..44949b6 100644
--- a/neo/tools/radiant/EntityDlg.cpp
+++ b/neo/tools/radiant/EntityDlg.cpp
@@ -106,7 +106,7 @@ BOOL CEntityDlg::OnInitDialog()
 	// EXCEPTION: OCX Property Pages should return FALSE
 }
 
-int CEntityDlg::OnToolHitTest(CPoint point, TOOLINFO* pTI) const
+INT_PTR CEntityDlg::OnToolHitTest( CPoint point, TOOLINFO* pTI ) const
 {
 	// TODO: Add your specialized code here and/or call the base class
 
@@ -164,85 +164,94 @@ void CEntityDlg::OnSize(UINT nType, int cx, int cy)
 	CDialog::OnSize(nType, cx, cy);
 	CRect rect, crect, crect2;
 	GetClientRect(rect);
-	int bh = (float)rect.Height() * (rect.Height() - 210) / rect.Height() / 2;
+	float scaling_factor = Win_GetWindowScalingFactor(staticTitle.GetSafeHwnd());
+
+	int s2 =  int(	2 * scaling_factor);
+	int s8 =  int(	8 * scaling_factor);
+	int s4 =  int(	4 * scaling_factor);
+	int s6 =  int(	6 * scaling_factor);
+	int s12 = int( 12 * scaling_factor);
+	int s16 = int( 16 * scaling_factor);
+
+	int bh = (float)rect.Height() * (rect.Height() - (210* scaling_factor)) / rect.Height() / 2;
 	staticTitle.GetWindowRect(crect);
-	staticTitle.SetWindowPos(NULL, 4, 4, rect.Width() -8, crect.Height(), SWP_SHOWWINDOW);
-	int top = 4 + crect.Height() + 4;
+	staticTitle.SetWindowPos(NULL, s4, s4, rect.Width() - s8, crect.Height(), SWP_SHOWWINDOW);
+	int top = crect.Height() + s8;
 	comboClass.GetWindowRect(crect);
 	btnCreate.GetWindowRect(crect2);
-	comboClass.SetWindowPos(NULL, 4, top, rect.Width() - 12 - crect2.Width(), crect.Height(), SWP_SHOWWINDOW);
+	comboClass.SetWindowPos(NULL, s4, top, rect.Width() - s12 - crect2.Width(), crect.Height(), SWP_SHOWWINDOW);
 	btnCreate.SetWindowPos(NULL, rect.Width() - crect2.Width() - 4, top, crect2.Width(), crect.Height(), SWP_SHOWWINDOW);
-	top += crect.Height() + 4;
-	listVars.SetWindowPos(NULL, 4, top, rect.Width() - 8, bh, SWP_SHOWWINDOW);
-	top += bh + 4;
-	listKeyVal.SetWindowPos(NULL, 4, top, rect.Width() - 8, bh, SWP_SHOWWINDOW);
-	top += bh + 4;
+	top += crect.Height() + s4;
+	listVars.SetWindowPos(NULL, s4, top, rect.Width() - s8, bh, SWP_SHOWWINDOW);
+	top += bh + s4;
+	listKeyVal.SetWindowPos(NULL, s4, top, rect.Width() - s8, bh, SWP_SHOWWINDOW);
+	top += bh + s4;
 	staticKey.GetWindowRect(crect);
-	staticKey.SetWindowPos(NULL, 4, top + 2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
-	int left = 4 + crect.Width() + 4;
+	staticKey.SetWindowPos(NULL, s4, top + s2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
+	int left = crect.Width() + s8;
 	int pad = crect.Width();
 	editKey.GetWindowRect(crect);
-	editKey.SetWindowPos(NULL, left, top, rect.Width() - 12 - pad, crect.Height(), SWP_SHOWWINDOW);
-	top += crect.Height() + 4;
+	editKey.SetWindowPos(NULL, left, top, rect.Width() - s12 - pad, crect.Height(), SWP_SHOWWINDOW);
+	top += crect.Height() + s4;
 	staticVal.GetWindowRect(crect);
-	staticVal.SetWindowPos(NULL, 4, top + 2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
+	staticVal.SetWindowPos(NULL, s4, top + s2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
 	editVal.GetWindowRect(crect);
 	bh = crect.Height();
-	editVal.SetWindowPos(NULL, left, top, rect.Width() - 16 - bh - pad, crect.Height(), SWP_SHOWWINDOW);
-	btnBrowse.SetWindowPos(NULL, rect.right - 4 - bh, top, bh, bh, SWP_SHOWWINDOW);
-	top += crect.Height() + 8;
+	editVal.SetWindowPos(NULL, left, top, rect.Width() - s16 - bh - pad, crect.Height(), SWP_SHOWWINDOW);
+	btnBrowse.SetWindowPos(NULL, rect.right - s4 - bh, top, bh, bh, SWP_SHOWWINDOW);
+	top += crect.Height() + s8;
 	btnModel.GetWindowRect(crect);
-	btnModel.SetWindowPos(NULL, rect.right - 4 - crect.Width(), top + 8, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
-	btnSound.SetWindowPos(NULL, rect.right - 4 - crect.Width(), top + 12 + crect.Height(), crect.Width(), crect.Height(), SWP_SHOWWINDOW);
-	btnGui.SetWindowPos(NULL, rect.right - 4 - crect.Width(), top + 16 + crect.Height() * 2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
-	btnParticle.SetWindowPos(NULL, rect.right - 8 - (crect.Width() * 2), top + 16 + crect.Height() * 2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
-	btnSkin.SetWindowPos( NULL, rect.right - 8 - ( crect.Width() * 2 ), top + 12 + crect.Height(), crect.Width(), crect.Height(), SWP_SHOWWINDOW );
-	btnCurve.SetWindowPos( NULL, rect.right - 8 - ( crect.Width() * 2 ), top + 8, crect.Width(), crect.Height(), SWP_SHOWWINDOW );
+	btnModel.SetWindowPos(NULL, rect.right - s4 - crect.Width(), top + s8, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
+	btnSound.SetWindowPos(NULL, rect.right - s4 - crect.Width(), top + s12 + crect.Height(), crect.Width(), crect.Height(), SWP_SHOWWINDOW);
+	btnGui.SetWindowPos(NULL, rect.right - s4 - crect.Width(), top + s16 + crect.Height() * 2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
+	btnParticle.SetWindowPos(NULL, rect.right - s8 - (crect.Width() * 2), top + s16 + crect.Height() * 2, crect.Width(), crect.Height(), SWP_SHOWWINDOW);
+	btnSkin.SetWindowPos( NULL, rect.right - s8 - ( crect.Width() * 2 ), top + s12 + crect.Height(), crect.Width(), crect.Height(), SWP_SHOWWINDOW );
+	btnCurve.SetWindowPos( NULL, rect.right - s8 - ( crect.Width() * 2 ), top + s8, crect.Width(), crect.Height(), SWP_SHOWWINDOW );
 
 	//*************************************
 	//animation controls
 	//*************************************
-	int rightAnimAreaBorder = rect.right - 75 - crect.Width (); /*models, etc button width*/
+	int rightAnimAreaBorder = rect.right - (75 * scaling_factor) -crect.Width(); /*models, etc button width*/
 
 	btnStopAnim.GetWindowRect(crect);
 	btnStopAnim.SetWindowPos(NULL,rightAnimAreaBorder - crect.Width (),
 		top + 8  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
 
-	left = rightAnimAreaBorder - crect.Width() - 4;
+	left = rightAnimAreaBorder - crect.Width() - s4;
 	btnPlayAnim.GetWindowRect(crect);
-	btnPlayAnim.SetWindowPos(NULL,left-crect.Width () ,top + 8 , crect.Width(),crect.Height(),SWP_SHOWWINDOW);
+	btnPlayAnim.SetWindowPos(NULL,left-crect.Width () ,top + s8 , crect.Width(),crect.Height(),SWP_SHOWWINDOW);
 
-	left -= crect.Width() + 4;
+	left -= crect.Width() + s4;
 	cbAnimations.GetWindowRect(crect);
-	cbAnimations.SetWindowPos(NULL,left-crect.Width (),top + 8  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
+	cbAnimations.SetWindowPos(NULL,left-crect.Width (),top + s8  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
 
 	staticFrame.GetWindowRect(crect);
 	staticFrame.SetWindowPos(NULL,rightAnimAreaBorder - crect.Width (),
-		top + 34  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
+		top + (34 * scaling_factor)  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
 
-	left = rightAnimAreaBorder - crect.Width () - 4;
+	left = rightAnimAreaBorder - crect.Width () - s4;
 
 	slFrameSlider.GetWindowRect(crect);
 	slFrameSlider.SetWindowPos(NULL,left - crect.Width (),
-	top + 32  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
+	top + (32 * scaling_factor)  ,crect.Width(),crect.Height(),SWP_SHOWWINDOW);
 
 	//*************************************
 	//*************************************
 
 	btn135.GetWindowRect(crect);
 	bh = crect.Width();
-	btn135.SetWindowPos(NULL, 4, top, bh, bh, SWP_SHOWWINDOW);
-	btn90.SetWindowPos(NULL, 4 + 2 + bh, top, bh, bh, SWP_SHOWWINDOW);
-	btn45.SetWindowPos(NULL, 4 + 2 + 2 + bh * 2, top, bh, bh, SWP_SHOWWINDOW);
-	btnUp.SetWindowPos(NULL, 4 + 2 + 2 + 6 + bh * 3, top + bh / 2,bh,bh, SWP_SHOWWINDOW);
-	btnDown.SetWindowPos(NULL, 4 + 2 + 2 + 6 + bh *3, top + bh / 2 + bh + 2,bh,bh, SWP_SHOWWINDOW);
-	top += bh + 2;
-	btn180.SetWindowPos(NULL, 4, top, bh, bh, SWP_SHOWWINDOW);
-	btn360.SetWindowPos(NULL, 4 + 2 + 2 + bh * 2, top, bh, bh, SWP_SHOWWINDOW);
-	top += bh + 2;
-	btn225.SetWindowPos(NULL, 4, top, bh, bh, SWP_SHOWWINDOW);
-	btn270.SetWindowPos(NULL, 4 + 2 + bh, top, bh, bh, SWP_SHOWWINDOW);
-	btn315.SetWindowPos(NULL, 4 + 2 + 2 + bh * 2, top, bh, bh, SWP_SHOWWINDOW);
+	btn135.SetWindowPos(NULL, s4, top, bh, bh, SWP_SHOWWINDOW);
+	btn90.SetWindowPos(NULL, s4 + s2 + bh, top, bh, bh, SWP_SHOWWINDOW);
+	btn45.SetWindowPos(NULL, s4 + s2 + s2 + bh * 2, top, bh, bh, SWP_SHOWWINDOW);
+	btnUp.SetWindowPos(NULL, s4 + s2 + s2 + s6 + bh * 3, top + bh / 2,bh,bh, SWP_SHOWWINDOW);
+	btnDown.SetWindowPos(NULL, s4 + s2 + s2 + s6 + bh *3, top + bh / 2 + bh + s2,bh,bh, SWP_SHOWWINDOW);
+	top += bh + s2;
+	btn180.SetWindowPos(NULL, s4, top, bh, bh, SWP_SHOWWINDOW);
+	btn360.SetWindowPos(NULL, s4 + s2 + s2 + bh * 2, top, bh, bh, SWP_SHOWWINDOW);
+	top += bh + s2;
+	btn225.SetWindowPos(NULL, s4, top, bh, bh, SWP_SHOWWINDOW);
+	btn270.SetWindowPos(NULL, s4 + s2 + bh, top, bh, bh, SWP_SHOWWINDOW);
+	btn315.SetWindowPos(NULL, s4 + s2 + s2 + bh * 2, top, bh, bh, SWP_SHOWWINDOW);
 	Invalidate();
 }
 
@@ -940,7 +949,7 @@ void CEntityDlg::OnCbnDblclkComboClass()
 // =======================================================================================================================
 //
 void CEntityDlg::CreateEntity() {
-	entity_t	*petNew;
+	entity_t	*petNew = NULL;
 	bool		forceFixed = false;
 
 	// check to make sure we have a brush
@@ -1148,7 +1157,7 @@ void CEntityDlg::OnBnClickedStopAnimation()
 	KillTimer ( 0 );
 }
 
-void CEntityDlg::OnTimer(UINT nIDEvent)
+void CEntityDlg::OnTimer( UINT_PTR nIDEvent )
 {
 	if ( !editEntity ) {
 		OnBnClickedStopAnimation ();
diff --git a/neo/tools/radiant/EntityDlg.h b/neo/tools/radiant/EntityDlg.h
index 708427e..c0b8a54 100644
--- a/neo/tools/radiant/EntityDlg.h
+++ b/neo/tools/radiant/EntityDlg.h
@@ -76,7 +76,7 @@ protected:
 public:
 
 	virtual BOOL OnInitDialog();
-	virtual int OnToolHitTest(CPoint point, TOOLINFO* pTI) const;
+	virtual INT_PTR OnToolHitTest( CPoint point, TOOLINFO* pTI ) const;
 	void AddClassNames();
 	void UpdateEntitySel(eclass_t *ent);
 	void SetKeyValPairs( bool updateAnims = true );
@@ -160,7 +160,7 @@ public:
 	afx_msg void OnLbnDblclkListVars();
 	void OnNMReleasedcaptureSlider1(NMHDR *pNMHDR, LRESULT *pResult);
 	afx_msg void OnCbnAnimationChange ();
-	void OnTimer(UINT nIDEvent);
+	void OnTimer(UINT_PTR nIDEvent);
 	afx_msg void OnBnClickedButtonParticle();
 	afx_msg void OnBnClickedButtonSkin();
 	afx_msg void OnBnClickedButtonCurve();
diff --git a/neo/tools/radiant/EntityListDlg.cpp b/neo/tools/radiant/EntityListDlg.cpp
index 91d8720..ea40f55 100644
--- a/neo/tools/radiant/EntityListDlg.cpp
+++ b/neo/tools/radiant/EntityListDlg.cpp
@@ -146,7 +146,7 @@ void CEntityListDlg::OnLbnSelchangeListEntities()
 			int count = pEntity->epairs.GetNumKeyVals();
 			for (int i = 0; i < count; i++) {
 				int nParent = m_lstEntity.InsertItem(0, pEntity->epairs.GetKeyVal(i)->GetKey());
-				m_lstEntity.SetItem(nParent, 1, LVIF_TEXT, pEntity->epairs.GetKeyVal(i)->GetValue(), 0, 0, 0, reinterpret_cast<DWORD>(pEntity));
+				m_lstEntity.SetItem(nParent, 1, LVIF_TEXT, pEntity->epairs.GetKeyVal(i)->GetValue(), 0, 0, 0, (LPARAM)(pEntity));
 			}
 		}
 	}
diff --git a/neo/tools/radiant/GLWidget.cpp b/neo/tools/radiant/GLWidget.cpp
index 652fa76..558a6d1 100644
--- a/neo/tools/radiant/GLWidget.cpp
+++ b/neo/tools/radiant/GLWidget.cpp
@@ -465,8 +465,8 @@ void idGLDrawableMaterial::draw(int x, int y, int w, int h) {
 		refdef.time = eventLoop->Milliseconds();
 
 		world->RenderScene( &refdef );
-		int frontEnd, backEnd;
-		renderSystem->EndFrame( &frontEnd, &backEnd );
+		int rsFrontEnd, rsBackEnd;
+		renderSystem->EndFrame( &rsFrontEnd, &rsBackEnd );
 
 		qglMatrixMode( GL_MODELVIEW );
 		qglLoadIdentity();
@@ -807,7 +807,7 @@ void idGLWidget::setDrawable(idGLDrawable *d) {
 }
 
 
-void idGLWidget::OnTimer(UINT nIDEvent) {
+void idGLWidget::OnTimer(UINT_PTR nIDEvent) {
 	if (drawable && drawable->getRealTime()) {
 		Invalidate(FALSE);
 	} else {
diff --git a/neo/tools/radiant/GLWidget.h b/neo/tools/radiant/GLWidget.h
index ee8ccff..debf701 100644
--- a/neo/tools/radiant/GLWidget.h
+++ b/neo/tools/radiant/GLWidget.h
@@ -212,7 +212,7 @@ protected:
 	afx_msg BOOL OnMouseWheel(UINT nFlags, short zDelta, CPoint pt);
 	afx_msg void OnRButtonDown(UINT nFlags, CPoint point);
 	afx_msg void OnRButtonUp(UINT nFlags, CPoint point);
-	afx_msg void OnTimer(UINT nIDEvent);
+	afx_msg void OnTimer(UINT_PTR nIDEvent);
 	afx_msg BOOL OnEraseBkgnd(CDC* pDC);
 	//}}AFX_MSG
 
diff --git a/neo/tools/radiant/InspectorDialog.cpp b/neo/tools/radiant/InspectorDialog.cpp
index 865e971..34ff9cd 100644
--- a/neo/tools/radiant/InspectorDialog.cpp
+++ b/neo/tools/radiant/InspectorDialog.cpp
@@ -130,6 +130,10 @@ void CInspectorDialog::OnSize(UINT nType, int cx, int cy)
 	DockedWindowInfo* info = NULL;
 	POSITION pos;
 	WORD wID;
+	
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s5 = int(5 * scaling_factor);
+	int s4 = int(4 * scaling_factor);
 
 	if (!initialized) {
 		return;
@@ -141,12 +145,12 @@ void CInspectorDialog::OnSize(UINT nType, int cx, int cy)
 	CRect tabRect;
 	m_Tabs.GetWindowRect(tabRect);
 	// retain vert size but size 4 in from edges and 4 up from bottom
-	tabRect.left = 4;
-	tabRect.right = rect.Width() - 4;
-	tabRect.top = rect.Height() - tabRect.Height() - 4;
-	tabRect.bottom = rect.Height() - 4;
+	tabRect.left = s4;
+	tabRect.right = rect.Width() - s4;
+	tabRect.top = rect.Height() - tabRect.Height() - s4;
+	tabRect.bottom = rect.Height() - s4;
 	// adjust rect for children size
-	rect.bottom -= 5 + tabRect.Height();
+	rect.bottom -= s5 + tabRect.Height();
 
 	m_Tabs.SetWindowPos(NULL, tabRect.left, tabRect.top, tabRect.Width(), tabRect.Height(), 0);
 
diff --git a/neo/tools/radiant/LightDlg.cpp b/neo/tools/radiant/LightDlg.cpp
index 31cf2d1..219d564 100644
--- a/neo/tools/radiant/LightDlg.cpp
+++ b/neo/tools/radiant/LightDlg.cpp
@@ -179,6 +179,8 @@ void CLightInfo::ToDictWriteAllInfo( idDict *e ) {
 
 	if (strTexture.GetLength() > 0 ) {
 		e->Set("texture", strTexture);
+	} else {
+		e->Set("texture", "");
 	}
 
 	idVec3 temp = color;
@@ -940,19 +942,19 @@ void CLightDlg::OnCheckParallel() {
 
 //jhefty - only apply settings that are different
 void CLightDlg::OnApplyDifferences () {
-	idDict differences, modified, original;
+	idDict differences, modifiedlight, originallight;
 
 	UpdateLightInfoFromDialog();
 
-	lightInfo.ToDict( &modified );
-	lightInfoOriginal.ToDictWriteAllInfo( &original );
+	lightInfo.ToDict( &modifiedlight);
+	lightInfoOriginal.ToDictWriteAllInfo( &originallight);
 
-	differences = modified;
+	differences = modifiedlight;
 
 	// jhefty - compile a set of modified values to apply
-	for ( int i = 0; i < modified.GetNumKeyVals (); i ++ ) {
-		const idKeyValue* valModified = modified.GetKeyVal ( i );
-		const idKeyValue* valOriginal = original.FindKey ( valModified->GetKey() );
+	for ( int i = 0; i < modifiedlight.GetNumKeyVals (); i ++ ) {
+		const idKeyValue* valModified = modifiedlight.GetKeyVal ( i );
+		const idKeyValue* valOriginal = originallight.FindKey ( valModified->GetKey() );
 
 		//if it hasn't changed, remove it from the list of values to apply
 		if ( !valOriginal || ( valModified->GetValue() == valOriginal->GetValue() ) ) {
@@ -962,7 +964,7 @@ void CLightDlg::OnApplyDifferences () {
 
 	SaveLightInfo( &differences );
 
-	lightInfoOriginal.FromDict( &modified );
+	lightInfoOriginal.FromDict( &modifiedlight);
 
 	Sys_UpdateWindows( W_ALL );
 }
diff --git a/neo/tools/radiant/MainFrm.cpp b/neo/tools/radiant/MainFrm.cpp
index 9a0bfac..0c49ddb 100644
--- a/neo/tools/radiant/MainFrm.cpp
+++ b/neo/tools/radiant/MainFrm.cpp
@@ -727,8 +727,8 @@ static UINT indicators[] = {
  =======================================================================================================================
  =======================================================================================================================
  */
-void CMainFrame::OnDisplayChange(UINT wParam, long lParam) {
-	int n = wParam;
+void CMainFrame::OnDisplayChange( WPARAM wp, LPARAM lp ) {
+//	int n = wp;
 }
 
 /*
@@ -1025,7 +1025,6 @@ MFCCreate
 */
 void MFCCreate( HINSTANCE hInstance )
 {
-	HMENU hMenu = NULL;
 	int i = sizeof(g_qeglobals.d_savedinfo);
 	long l = i;
 
@@ -1079,8 +1078,8 @@ void MFCCreate( HINSTANCE hInstance )
 
 		// old size was smaller, reload original prefs
 		if (nOldSize > 0 && nOldSize < sizeof(g_qeglobals.d_savedinfo)) {
-			long l = nOldSize;
-			LoadRegistryInfo("radiant_SavedInfo", &g_qeglobals.d_savedinfo, &l);
+			long lOldSize = nOldSize;
+			LoadRegistryInfo("radiant_SavedInfo", &g_qeglobals.d_savedinfo, &lOldSize);
 		}
 	}
 }
@@ -1114,7 +1113,6 @@ int CMainFrame::OnCreate(LPCREATESTRUCT lpCreateStruct) {
 		TRACE0("Failed to create toolbar\n");
 		return -1;	// fail to create
 	}
-
 	if (!m_wndStatusBar.Create(this) || !m_wndStatusBar.SetIndicators(indicators, sizeof(indicators) / sizeof(UINT))) {
 		TRACE0("Failed to create status bar\n");
 		return -1;	// fail to create
@@ -1412,7 +1410,7 @@ bool MouseDown() {
  =======================================================================================================================
  */
 
-void CMainFrame::OnTimer(UINT nIDEvent) {
+void CMainFrame::OnTimer(UINT_PTR nIDEvent) {
 	static bool autoSavePending = false;
 
 	if ( nIDEvent == QE_TIMER0 && !MouseDown() ) {
@@ -1799,23 +1797,24 @@ void CMainFrame::OnSize(UINT nType, int cx, int cy) {
 
 	CRect	rctParent;
 	GetClientRect(rctParent);
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
 
 	UINT	nID;
 	UINT	nStyle;
 	int		nWidth;
 	if (m_wndStatusBar.GetSafeHwnd()) {
 		m_wndStatusBar.GetPaneInfo( 0, nID, nStyle, nWidth);
-		m_wndStatusBar.SetPaneInfo( 0, nID, nStyle, rctParent.Width() * 0.15f );
+		m_wndStatusBar.SetPaneInfo( 0, nID, nStyle, rctParent.Width() * 0.15f * scaling_factor);
 		m_wndStatusBar.GetPaneInfo( 1, nID, nStyle, nWidth);
-		m_wndStatusBar.SetPaneInfo( 1, nID, nStyle, rctParent.Width() * 0.15f);
+		m_wndStatusBar.SetPaneInfo( 1, nID, nStyle, rctParent.Width() * 0.15f * scaling_factor);
 		m_wndStatusBar.GetPaneInfo( 2, nID, nStyle, nWidth);
-		m_wndStatusBar.SetPaneInfo( 2, nID, nStyle, rctParent.Width() * 0.15f );
+		m_wndStatusBar.SetPaneInfo( 2, nID, nStyle, rctParent.Width() * 0.15f * scaling_factor);
 		m_wndStatusBar.GetPaneInfo( 3, nID, nStyle, nWidth);
-		m_wndStatusBar.SetPaneInfo( 3, nID, nStyle, rctParent.Width() * 0.39f );
+		m_wndStatusBar.SetPaneInfo( 3, nID, nStyle, rctParent.Width() * 0.39f * scaling_factor);
 		m_wndStatusBar.GetPaneInfo( 4, nID, nStyle, nWidth);
-		m_wndStatusBar.SetPaneInfo( 4, nID, nStyle, rctParent.Width() * 0.15f );
+		m_wndStatusBar.SetPaneInfo( 4, nID, nStyle, rctParent.Width() * 0.15f * scaling_factor);
 		m_wndStatusBar.GetPaneInfo( 5, nID, nStyle, nWidth);
-		m_wndStatusBar.SetPaneInfo( 5, nID, nStyle, rctParent.Width() * 0.01f );
+		m_wndStatusBar.SetPaneInfo( 5, nID, nStyle, rctParent.Width() * 0.01f * scaling_factor);
 	}
 }
 
@@ -2134,7 +2133,7 @@ This is the new all-internal bsp
 ============
 */
 void RunBsp (const char *command) {
-	char	sys[2048];
+	char	system[2048];
 	char	name[2048];
 	char	*in;
 
@@ -2182,28 +2181,28 @@ void RunBsp (const char *command) {
 
 		::GetModuleFileName(AfxGetApp()->m_hInstance, buff, sizeof(buff));
 		if (strlen(command) > strlen("bspext")) {
-			idStr::snPrintf( sys, sizeof(sys), "%s %s +set r_fullscreen 0 +dmap editorOutput %s %s +quit", buff, paths.c_str(), command + strlen("bspext"), in );
+			idStr::snPrintf( system, sizeof(system), "%s %s +set r_fullscreen 0 +dmap editorOutput %s %s +quit", buff, paths.c_str(), command + strlen("bspext"), in );
 		} else {
-			idStr::snPrintf( sys, sizeof(sys), "%s %s +set r_fullscreen 0 +dmap editorOutput %s +quit", buff, paths.c_str(), in );
+			idStr::snPrintf( system, sizeof(system), "%s %s +set r_fullscreen 0 +dmap editorOutput %s +quit", buff, paths.c_str(), in );
 		}
 
 		::GetStartupInfo (&startupinfo);
-		if (!CreateProcess(NULL, sys, NULL, NULL, FALSE, 0, NULL, NULL, &startupinfo, &ProcessInformation)) {
+		if (!CreateProcess(NULL, system, NULL, NULL, FALSE, 0, NULL, NULL, &startupinfo, &ProcessInformation)) {
 			common->Printf("Could not start bsp process %s %s/n", buff, sys);
 		}
 		g_pParentWnd->SetFocus();
 
 	} else { // assumes bsp is the command
 		if (strlen(command) > strlen("bsp")) {
-			idStr::snPrintf( sys, sizeof(sys), "dmap %s %s", command + strlen("bsp"), in );
+			idStr::snPrintf( system, sizeof(system), "dmap %s %s", command + strlen("bsp"), in );
 		} else {
-			idStr::snPrintf( sys, sizeof(sys), "dmap %s", in );
+			idStr::snPrintf( system, sizeof(system), "dmap %s", in );
 		}
 
 		cmdSystem->BufferCommandText( CMD_EXEC_NOW, "disconnect\n" );
 
 		// issue the bsp command
-		Dmap_f( idCmdArgs( sys, false ) );
+		Dmap_f( idCmdArgs( system, false ) );
 	}
 }
 
@@ -2743,8 +2742,8 @@ LPCSTR String_ToLower(LPCSTR psString)
 bool FindNextBrush(brush_t* pPrevFoundBrush)	// can be NULL for fresh search
 {
 	bool bFoundSomething = false;
-	entity_t *pLastFoundEnt;
-	brush_t  *pLastFoundBrush;
+	entity_t *pLastFoundEnt = NULL;
+	brush_t  *pLastFoundBrush = NULL;
 
 	CWaitCursor waitcursor;
 
@@ -3007,8 +3006,7 @@ void CMainFrame::OnMiscSetViewPos()
 		if (iArgsFound == 3)
 		{
 			// try for an optional 4th (note how this wasn't part of the sscanf() above, so I can check 1st-3, not just any 3)
-			//
-			int iArgsFound = sscanf(psNewCoords,"%f %f %f %f", &v3Viewpos[0], &v3Viewpos[1], &v3Viewpos[2], &fYaw);
+			iArgsFound = sscanf(psNewCoords,"%f %f %f %f", &v3Viewpos[0], &v3Viewpos[1], &v3Viewpos[2], &fYaw);
 			if (iArgsFound != 4)
 			{
 				fYaw = 0;	// jic
@@ -6361,10 +6359,9 @@ void CMainFrame::OnShowLightvolumes() {
  =======================================================================================================================
  */
 void CMainFrame::OnActivate(UINT nState, CWnd *pWndOther, BOOL bMinimized) {
-	CFrameWnd::OnActivate(nState, pWndOther, bMinimized);
-
+	CFrameWnd::OnActivate(nState, pWndOther, bMinimized);	
 	if ( nState != WA_INACTIVE ) {
-		common->ActivateTool( true );
+		common->ActivateTool(true);
 		if (::IsWindowVisible(win32.hWnd)) {
 			::ShowWindow(win32.hWnd, SW_HIDE);
 		}
diff --git a/neo/tools/radiant/MainFrm.h b/neo/tools/radiant/MainFrm.h
index f577768..bf4000d 100644
--- a/neo/tools/radiant/MainFrm.h
+++ b/neo/tools/radiant/MainFrm.h
@@ -180,7 +180,7 @@ public:
 	afx_msg void OnBSPDone(UINT wParam, long lParam);
 	afx_msg void OnParentNotify(UINT message, LPARAM lParam);
 	afx_msg int OnCreate(LPCREATESTRUCT lpCreateStruct);
-	afx_msg void OnTimer(UINT nIDEvent);
+	afx_msg void OnTimer(UINT_PTR nIDEvent);
 	afx_msg void OnDestroy();
 	afx_msg void OnClose();
 	afx_msg void OnKeyDown(UINT nChar, UINT nRepCnt, UINT nFlags);
diff --git a/neo/tools/radiant/MapInfo.cpp b/neo/tools/radiant/MapInfo.cpp
index dca3d3c..3703646 100644
--- a/neo/tools/radiant/MapInfo.cpp
+++ b/neo/tools/radiant/MapInfo.cpp
@@ -91,9 +91,9 @@ BOOL CMapInfo::OnInitDialog()
 
   CMapStringToPtr mapEntity;
 
-  int nValue = 0;
-	for (entity_t* pEntity=entities.next ; pEntity != &entities ; pEntity=pEntity->next)
-	{
+  intptr_t nValue = 0;
+  for (entity_t* pEntity=entities.next ; pEntity != &entities ; pEntity=pEntity->next)
+  {
 	m_nTotalEntities++;
 	nValue = 0;
 	mapEntity.Lookup(pEntity->eclass->name, reinterpret_cast<void*&>(nValue));
diff --git a/neo/tools/radiant/MediaPreviewDlg.cpp b/neo/tools/radiant/MediaPreviewDlg.cpp
index 750c324..fa2db57 100644
--- a/neo/tools/radiant/MediaPreviewDlg.cpp
+++ b/neo/tools/radiant/MediaPreviewDlg.cpp
@@ -108,6 +108,10 @@ BOOL CMediaPreviewDlg::OnInitDialog()
 
 void CMediaPreviewDlg::OnSize(UINT nType, int cx, int cy)
 {
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s8 = int(8 * scaling_factor);
+	int s4 = int(4 * scaling_factor);
+
 	CDialog::OnSize(nType, cx, cy);
 	if (wndPreview.GetSafeHwnd() == NULL) {
 		return;
@@ -115,8 +119,8 @@ void CMediaPreviewDlg::OnSize(UINT nType, int cx, int cy)
 	CRect rect;
 	GetClientRect(rect);
 	//int h = (mode == GUIS) ? (rect.Width() - 8) / 1.333333f : rect.Height() - 8;
-	int h = rect.Height() - 8;
-	wndPreview.SetWindowPos(NULL, 4, 4, rect.Width() - 8, h, SWP_SHOWWINDOW);
+	int h = rect.Height() - s8;
+	wndPreview.SetWindowPos(NULL, s4, s4, rect.Width() - s8, h, SWP_SHOWWINDOW);
 }
 
 void CMediaPreviewDlg::OnDestroy()
diff --git a/neo/tools/radiant/NewTexWnd.cpp b/neo/tools/radiant/NewTexWnd.cpp
index 10ec3b0..dd85459 100644
--- a/neo/tools/radiant/NewTexWnd.cpp
+++ b/neo/tools/radiant/NewTexWnd.cpp
@@ -878,7 +878,7 @@ BOOL CNewTexWnd::OnToolTipNotify( UINT id, NMHDR * pNMHDR, LRESULT * pResult ) {
 	return(FALSE);
 }
 
-int CNewTexWnd::OnToolHitTest(CPoint point, TOOLINFO * pTI)
+INT_PTR CNewTexWnd::OnToolHitTest(CPoint point, TOOLINFO * pTI)
 {
 	const idMaterial *mat = getMaterialAtPoint(point);
 	if (mat) {
diff --git a/neo/tools/radiant/NewTexWnd.h b/neo/tools/radiant/NewTexWnd.h
index b056141..b41a501 100644
--- a/neo/tools/radiant/NewTexWnd.h
+++ b/neo/tools/radiant/NewTexWnd.h
@@ -71,7 +71,7 @@ public:
 	void LoadMaterials();
 	virtual ~CNewTexWnd();
 	BOOL OnToolTipNotify( UINT id, NMHDR * pNMHDR, LRESULT * pResult );
-	int CNewTexWnd::OnToolHitTest(CPoint point, TOOLINFO * pTI);
+	INT_PTR CNewTexWnd::OnToolHitTest(CPoint point, TOOLINFO * pTI);
 	virtual BOOL PreTranslateMessage(MSG* pMsg);
 
 protected:
diff --git a/neo/tools/radiant/PMESH.CPP b/neo/tools/radiant/PMESH.CPP
index 9decaab..5943978 100644
--- a/neo/tools/radiant/PMESH.CPP
+++ b/neo/tools/radiant/PMESH.CPP
@@ -59,9 +59,6 @@ patchMesh_t * MakeNewPatch( int width,int height ) {
 
 void Patch_AdjustSize( patchMesh_t *p,int wadj,int hadj ) {
 	idDrawVert	*newverts	= reinterpret_cast< idDrawVert*>(Mem_ClearedAlloc(sizeof(idDrawVert) * (p->width + wadj) * (p->height + hadj)));
-	int			copyWidth	= (wadj < 0) ? p->width + wadj : p->width;
-	int			copyHeight	= (hadj < 0) ? p->height + hadj : p->height;
-	int			copysize	= copyWidth *copyHeight * sizeof(idDrawVert);
 
 	for ( int i = 0; i < p->width; i++ ) {
 		for ( int j = 0; j < p->height; j++ ) {
@@ -1325,7 +1322,6 @@ brush_t * CapSpecial( patchMesh_t *pParent,int nType,bool bFirst ) {
 	vMin[0] = vMin[1] = vMin[2] = 99999;
 	vMax[0] = vMax[1] = vMax[2] = -99999;
 
-	int	nSize	= pParent->width;
 	int	nIndex	= (bFirst) ? 0 : pParent->height - 1;
 
 	// parent bounds are used for some things
@@ -1506,8 +1502,8 @@ void Patch_CapCurrent( bool bInvertedBevel,bool bInvertedEndcap ) {
 //FIXME: Table drive all this crap
 //
 void GenerateEndCaps( brush_t *brushParent,bool bBevel,bool bEndcap,bool bInverted ) {
-	brush_t		*b, *b2;
-	patchMesh_t	*p, *p2, *pParent;
+	brush_t		*b;
+	patchMesh_t	*p, *pParent;
 	idVec3		vTemp, vMin, vMax;
 	int			i, j;
 
@@ -1549,6 +1545,7 @@ void GenerateEndCaps( brush_t *brushParent,bool bBevel,bool bEndcap,bool bInvert
 	Select_Brush(p->pSymbiot);
 	return;
 
+#if 0
 	bool	bCreated	= false;
 
 	if ( bInverted ) {
@@ -1712,7 +1709,7 @@ void GenerateEndCaps( brush_t *brushParent,bool bBevel,bool bEndcap,bool bInvert
 		Select_Delete();
 	}
 	//Select_Brush(brushParent);
-
+#endif
 }
 
 
@@ -2207,9 +2204,7 @@ DrawPatchMesh
 //FIXME: this routine needs to be reorganized.. should be about 1/4 the size and complexity
 void DrawPatchMesh( patchMesh_t *pm,bool bPoints,int *list,bool bShade = false ) {
 	int		i, j;
-
 	bool	bOverlay	= pm->bOverlay;
-	int		nDrawMode	= g_pParentWnd->GetCamera()->Camera().draw_mode;
 
 	// patches use two display lists, one for camera one for xy
 	if ( *list <= 0 ) {
@@ -3145,9 +3140,9 @@ void Parse2DMatrix( int y,int x,float *p ) {
 	GetToken(true); // )
 }
 
-void Parse3DMatrix( int z,int y,int x,float *p ) {
+void Parse3DMatrix( int _z,int y,int x,float *p ) {
 	GetToken(true); // (
-	for ( int i = 0; i < z; i++ ) {
+	for ( int i = 0; i < _z; i++ ) {
 		Parse2DMatrix(y, x, p + i * (x * MAX_PATCH_HEIGHT));
 	}
 	GetToken(true); // )
@@ -3904,11 +3899,11 @@ void _Write2DMatrix( FILE *f,int y,int x,float *m ) {
 }
 
 
-void _Write3DMatrix( FILE *f,int z,int y,int x,float *m ) {
+void _Write3DMatrix( FILE *f,int _z,int y,int x,float *m ) {
 	int	i;
 
 	fprintf(f, "(\n");
-	for ( i = 0 ; i < z ; i++ ) {
+	for ( i = 0 ; i < _z ; i++ ) {
 		_Write2DMatrix(f, y, x, m + i * (x * MAX_PATCH_HEIGHT));
 	}
 	fprintf(f, ")\n");
@@ -3940,11 +3935,11 @@ void _Write2DMatrix( CMemFile *f,int y,int x,float *m ) {
 }
 
 
-void _Write3DMatrix( CMemFile *f,int z,int y,int x,float *m ) {
+void _Write3DMatrix( CMemFile *f,int _z,int y,int x,float *m ) {
 	int	i;
 
 	MemFile_fprintf(f, "(\n");
-	for ( i = 0 ; i < z ; i++ ) {
+	for ( i = 0 ; i < _z ; i++ ) {
 		_Write2DMatrix(f, y, x, m + i * (x * MAX_PATCH_HEIGHT));
 	}
 	MemFile_fprintf(f, ")\n");
@@ -4381,11 +4376,6 @@ void Patch_FromTriangle( idVec5 vx,idVec5 vy,idVec5 vz ) {
 	p->ctrl(2, 1).st[1] = vMidYZ[4];
 	p->ctrl(2, 2).st[0] = vz[3];
 	p->ctrl(2, 2).st[1] = vz[4];
-
-
-	//Patch_Naturalize(p);
-
-	brush_t	*b	= AddBrushForPatch(p);
 }
 
 
diff --git a/neo/tools/radiant/PreviewDlg.cpp b/neo/tools/radiant/PreviewDlg.cpp
index ad80b98..12689ce 100644
--- a/neo/tools/radiant/PreviewDlg.cpp
+++ b/neo/tools/radiant/PreviewDlg.cpp
@@ -121,6 +121,11 @@ void CPreviewDlg::BuildTree() {
 		files = fileSystem->ListFilesTree( "models", ".ma" );
 		AddStrList( "base", files->GetList(), MODELS );
 		fileSystem->FreeFileList( files );
+#if USE_COLLADA
+		files = fileSystem->ListFilesTree("models", ".dae");
+#endif
+		AddStrList("base", files->GetList(), MODELS);
+		fileSystem->FreeFileList(files);
 	} else if ( currentMode == SOUNDS ) {
 		AddSounds( true );
 	} else if ( currentMode == MATERIALS ) {
@@ -147,9 +152,9 @@ void CPreviewDlg::AddCommentedItems() {
 	if (fileSystem->ReadFile(path, (void**)&buffer, NULL) && buffer) {
 		src.LoadMemory(buffer, strlen(buffer), path);
 		if (src.IsLoaded()) {
-			idToken token, tok1, tok2, tok3;
-			while( src.ReadToken( &token ) ) {
-				if (token == "{") {
+			idToken commenttoken, tok1, tok2, tok3;
+			while( src.ReadToken( &commenttoken) ) {
+				if (commenttoken == "{") {
 					// start a new commented item
 					CommentedItem ci;
 					if (src.ReadToken(&tok1) && src.ReadToken(&tok2) && src.ReadToken(&tok3)) {
@@ -278,7 +283,6 @@ void CPreviewDlg::AddStrList( const char *root, const idStrList &list, int id )
 
 void CPreviewDlg::OnTvnSelchangedTreeMedia(NMHDR *pNMHDR, LRESULT *pResult)
 {
-	LPNMTREEVIEW pNMTreeView = reinterpret_cast<LPNMTREEVIEW>(pNMHDR);
 	HTREEITEM item = treeMedia.GetSelectedItem();
 	mediaName = "";
 	CWnd *add = GetDlgItem(IDC_BUTTON_ADD);
diff --git a/neo/tools/radiant/PropertyList.cpp b/neo/tools/radiant/PropertyList.cpp
index 92d829f..222ff4a 100644
--- a/neo/tools/radiant/PropertyList.cpp
+++ b/neo/tools/radiant/PropertyList.cpp
@@ -88,6 +88,9 @@ BOOL CPropertyList::PreCreateWindow(CREATESTRUCT& cs) {
 }
 
 void CPropertyList::MeasureItem(LPMEASUREITEMSTRUCT lpMeasureItemStruct) {
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s20 = int(20 * scaling_factor);
+
 	if (measureItem && !measureItem->m_curValue.IsEmpty()) {
 		CRect rect;
 		GetClientRect(rect);
@@ -96,16 +99,20 @@ void CPropertyList::MeasureItem(LPMEASUREITEMSTRUCT lpMeasureItemStruct) {
 		}
 		rect.left = m_nDivider;
 		CDC * dc = GetDC();
-		dc->DrawText(measureItem->m_curValue, rect, DT_CALCRECT | DT_LEFT | DT_WORDBREAK);
+		int ret = dc->DrawText(measureItem->m_curValue, rect, DT_INTERNAL | DT_CALCRECT | DT_LEFT | DT_WORDBREAK);
 		ReleaseDC(dc);
-		lpMeasureItemStruct->itemHeight = (rect.Height() >= 20) ? rect.Height() : 20; //pixels
+		lpMeasureItemStruct->itemHeight = (ret >= s20) ? ret * scaling_factor : s20; //pixels
 	} else {
-		lpMeasureItemStruct->itemHeight = 20; //pixels
+		lpMeasureItemStruct->itemHeight = s20; //pixels
 	}
 }
 
 
 void CPropertyList::DrawItem(LPDRAWITEMSTRUCT lpDIS) {
+
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s3 = 3;// int(3 * scaling_factor);
+
 	CDC dc;
 	dc.Attach(lpDIS->hDC);
 	CRect rectFull = lpDIS->rcItem;
@@ -115,7 +122,7 @@ void CPropertyList::DrawItem(LPDRAWITEMSTRUCT lpDIS) {
 	}
 	rect.left = m_nDivider;
 	CRect rect2 = rectFull;
-	rect2.right = rect.left - 1;
+	rect2.right = rect.left - (1 * scaling_factor);
 	UINT nIndex = lpDIS->itemID;
 
 	if (nIndex != (UINT) -1) {
@@ -136,12 +143,12 @@ void CPropertyList::DrawItem(LPDRAWITEMSTRUCT lpDIS) {
 
 		//write the property name in the first rectangle
 		dc.SetBkMode(TRANSPARENT);
-		dc.DrawText(pItem->m_propName,CRect(rect2.left+3,rect2.top+3,
-											rect2.right-3,rect2.bottom+3),
+		dc.DrawText(pItem->m_propName,CRect(rect2.left+s3,rect2.top+s3,
+											rect2.right-s3,rect2.bottom+s3),
 					DT_LEFT | DT_SINGLELINE);
 
 		//write the initial property value in the second rectangle
-		dc.DrawText(pItem->m_curValue,CRect(rect.left+3,rect.top+3, rect.right+3,rect.bottom+3), DT_LEFT | (pItem->m_nItemType == PIT_VAR) ? DT_WORDBREAK : DT_SINGLELINE);
+		 dc.DrawText(pItem->m_curValue,CRect(rect.left+s3,rect.top+s3, rect.right+s3,rect.bottom+s3), DT_LEFT | (pItem->m_nItemType == PIT_VAR) ? DT_WORDBREAK : DT_SINGLELINE);
 	}
 	dc.Detach();
 }
@@ -187,6 +194,8 @@ void CPropertyList::OnSelchange() {
 	static int recurse = 0;
 	//m_curSel = GetCurSel();
 
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s3 = int(3 * scaling_factor);
 
 	GetItemRect(m_curSel,rect);
 	rect.left = m_nDivider;
@@ -208,7 +217,7 @@ void CPropertyList::OnSelchange() {
 		if (m_cmbBox) {
 			m_cmbBox.MoveWindow(rect);
 		} else {
-			rect.bottom += 300;
+			rect.bottom += (s3 * 100);
 			m_cmbBox.Create(CBS_DROPDOWNLIST | WS_VSCROLL | WS_VISIBLE | WS_CHILD | WS_BORDER,rect,this,IDC_PROPCMBBOX);
 			m_cmbBox.SetFont(&m_SSerif8Font);
 		}
@@ -242,7 +251,7 @@ void CPropertyList::OnSelchange() {
 		//display edit box
 		m_nLastBox = 1;
 		m_prevSel = m_curSel;
-		rect.bottom -= 3;
+		rect.bottom -= s3;
 		if (m_editBox) {
 			m_editBox.MoveWindow(rect);
 		} else {
@@ -268,11 +277,13 @@ void CPropertyList::DisplayButton(CRect region) {
 	//displays a button if the property is a file/color/font chooser
 	m_nLastBox = 2;
 	m_prevSel = m_curSel;
+	float scaling_factor = Win_GetWindowScalingFactor(GetSafeHwnd());
+	int s3 = int(3 * scaling_factor);
 
 	if (region.Width() > 25) {
 		region.left = region.right - 25;
 	}
-	region.bottom -= 3;
+	region.bottom -= s3;
 
 	if (m_btnCtrl) {
 		m_btnCtrl.MoveWindow(region);
diff --git a/neo/tools/radiant/Radiant.cpp b/neo/tools/radiant/Radiant.cpp
index c7c2216..018a8b1 100644
--- a/neo/tools/radiant/Radiant.cpp
+++ b/neo/tools/radiant/Radiant.cpp
@@ -123,11 +123,12 @@ void RadiantInit( void ) {
 		Sys_GrabMouseCursor( false );
 
 		g_DoomInstance = win32.hInstance;
-		CWinApp* pApp = AfxGetApp();
-		CWinThread *pThread = AfxGetThread();
 
 		InitAfx();
 
+		CWinApp* pApp = AfxGetApp();
+		CWinThread *pThread = AfxGetThread();
+
 		// App global initializations (rare)
 		pApp->InitApplication();
 
@@ -175,7 +176,8 @@ void RadiantRun( void ) {
 			theApp.Run();
 			//qglPopAttrib();
 			//qwglMakeCurrent(0, 0);
-			qwglMakeCurrent(win32.hDC, win32.hGLRC);
+			if (win32.hDC != NULL && win32.hGLRC != NULL)
+				qwglMakeCurrent(win32.hDC, win32.hGLRC);
 		}
 	}
 	catch( idException &ex ) {
diff --git a/neo/tools/radiant/WIN_DLG.CPP b/neo/tools/radiant/WIN_DLG.CPP
index b6523ef..f4e680c 100644
--- a/neo/tools/radiant/WIN_DLG.CPP
+++ b/neo/tools/radiant/WIN_DLG.CPP
@@ -31,7 +31,7 @@ If you have questions concerning this license or the applicable additional terms
 
 #include "qe3.h"
 
-BOOL CALLBACK EditCommandDlgProc (
+INT_PTR CALLBACK EditCommandDlgProc(
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -96,7 +96,7 @@ BOOL CALLBACK EditCommandDlgProc (
 	return FALSE;
 }
 
-BOOL CALLBACK AddCommandDlgProc (
+INT_PTR CALLBACK AddCommandDlgProc (
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -161,7 +161,7 @@ void UpdateBSPCommandList (HWND hwndDlg)
 
 
 // FIXME: turn this into an MFC dialog
-BOOL CALLBACK ProjectDlgProc (
+INT_PTR CALLBACK ProjectDlgProc(
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -255,7 +255,7 @@ void DoProjectSettings()
 
 
 
-BOOL CALLBACK GammaDlgProc (
+INT_PTR CALLBACK GammaDlgProc(
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -390,7 +390,7 @@ void GetSelectionIndex (int *ent, int *brush)
 	;
 }
 
-BOOL CALLBACK FindBrushDlgProc (
+INT_PTR CALLBACK FindBrushDlgProc(
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -450,7 +450,7 @@ void DoFind(void)
 */
 
 
-BOOL CALLBACK RotateDlgProc (
+INT_PTR CALLBACK RotateDlgProc(
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -517,7 +517,7 @@ void DoRotate(void)
 
 bool g_bDoCone = false;
 bool g_bDoSphere = false;
-BOOL CALLBACK SidesDlgProc (
+INT_PTR CALLBACK SidesDlgProc(
 	HWND hwndDlg,	// handle to dialog box
 	UINT uMsg,	// message
 	WPARAM wParam,	// first message parameter
@@ -575,7 +575,7 @@ void DoSides(bool bCone, bool bSphere, bool bTorus)
 DoAbout
 ===================
 */
-BOOL CALLBACK AboutDlgProc( HWND hwndDlg,
+INT_PTR CALLBACK AboutDlgProc( HWND hwndDlg,
 							UINT uMsg,
 							WPARAM wParam,
 							LPARAM lParam )
diff --git a/neo/tools/radiant/WIN_QE3.CPP b/neo/tools/radiant/WIN_QE3.CPP
index e234c22..a2b08a3 100644
--- a/neo/tools/radiant/WIN_QE3.CPP
+++ b/neo/tools/radiant/WIN_QE3.CPP
@@ -176,7 +176,7 @@ int WINAPI QEW_SetupPixelFormat(HDC hDC, bool zbuffer)
 {
 #if 1
 
-	int pixelFormat = ChoosePixelFormat(hDC, &win32.pfd);
+	int pixelFormat = Win_ChoosePixelFormat(hDC);
 	if (pixelFormat > 0) {
 		if (SetPixelFormat(hDC, pixelFormat, &win32.pfd) == NULL) {
 			Error("SetPixelFormat failed.");
@@ -281,11 +281,6 @@ void Error(char *error, ...) {
 void Warning(char *error, ...) {
 	va_list argptr;
 	char	text[1024];
-	int		err;
-
-	err = GetLastError();
-
-	int i = qglGetError();
 
 	va_start(argptr, error);
 	vsprintf(text, error, argptr);
@@ -320,7 +315,7 @@ static char			szFilter[260] =			/* filter string */
 static char			szProjectFilter[260] =	/* filter string */
 "Q3Radiant project (*.qe4, *.prj)\0*.qe4\0*.prj\0\0";
 static char			chReplace;				/* string separator for szFilter */
-static int			i, cbString;			/* integer count variables */
+static int			cbString;				/* integer count variables */
 static HANDLE		hf;						/* file handle */
 
 /*
diff --git a/neo/tools/radiant/XYWnd.cpp b/neo/tools/radiant/XYWnd.cpp
index 3250e16..8675958 100644
--- a/neo/tools/radiant/XYWnd.cpp
+++ b/neo/tools/radiant/XYWnd.cpp
@@ -1870,8 +1870,6 @@ void CreateSmartEntity(CXYWnd *pWnd, int x, int y, const char *pName) {
 void FinishSmartCreation() {
 	CPtrArray	array;
 	HideInfoDialog();
-
-	brush_t *pEntities = NULL;
 	if (g_strSmartEntity.Find("Smart_Train") >= 0) {
 		g_bScreenUpdates = false;
 		CreateRightClickEntity(g_pParentWnd->ActiveXY(), g_nSmartX, g_nSmartY, "func_train");
@@ -2098,7 +2096,7 @@ bool MergeMenu(CMenu * pMenuDestination, const CMenu * pMenuAdd, bool bTopLevel
 			HMENU hNewMenu = NewPopupMenu.GetSafeHmenu();
 			if (pMenuDestination->InsertMenu(iInsertPosDefault,
 				MF_BYPOSITION | MF_POPUP | MF_ENABLED,
-				(UINT)hNewMenu, sMenuAddString ))
+				(UINT_PTR)hNewMenu, sMenuAddString ))
 			{
 				// don't forget to correct the item count
 				iMenuDestItemCount++;
@@ -2169,7 +2167,7 @@ void CXYWnd::HandleDrop() {
 					if (pChild) {
 						pMakeEntityPop->AppendMenu (
 							MF_POPUP,
-							reinterpret_cast < unsigned int > (pChild->GetSafeHmenu()),
+							reinterpret_cast <UINT_PTR> (pChild->GetSafeHmenu()),
 							strActive
 						);
 						g_ptrMenus.Add(pChild);
@@ -2188,7 +2186,7 @@ void CXYWnd::HandleDrop() {
 				if (pChild) {
 					pMakeEntityPop->AppendMenu (
 						MF_POPUP,
-						reinterpret_cast < unsigned int > (pChild->GetSafeHmenu()),
+						reinterpret_cast <UINT_PTR> (pChild->GetSafeHmenu()),
 						strActive
 					);
 					g_ptrMenus.Add(pChild);
@@ -2204,7 +2202,7 @@ void CXYWnd::HandleDrop() {
 		if ( pMakeEntityPop != &m_mnuDrop ) {
 			m_mnuDrop.AppendMenu (
 				MF_POPUP,
-				reinterpret_cast < unsigned int > (pMakeEntityPop->GetSafeHmenu()),
+				reinterpret_cast <UINT_PTR> (pMakeEntityPop->GetSafeHmenu()),
 				"Make Entity"
 			);
 		}
@@ -2874,8 +2872,6 @@ void CXYWnd::XY_DrawGrid() {
 		// glColor4f(0, 0, 0, 0);
 		qglColor3fv(g_qeglobals.d_savedinfo.colors[COLOR_GRIDTEXT].ToFloatPtr());
 
-		float	lastRaster = xb;
-
 		for (x = xb; x < xe; x += stepSize) {
 			qglRasterPos2f(x, m_vOrigin[nDim2] + h - 10 / m_fScale);
 			sprintf(text, "%i", (int)x);
@@ -3330,7 +3326,7 @@ void DrawPathLines(void) {
 
 	num_entities = 0;
 	for (te = entities.next; te != &entities && num_entities != MAX_MAP_ENTITIES; te = te->next) {
-		for (int i = 0; i < 2048; i++) {
+		for (i = 0; i < 2048; i++) {
 			if (i == 0) {
 				ent_target[num_entities] = ValueForKey(te, "target");
 			} else {
@@ -4260,9 +4256,7 @@ bool CXYWnd::UndoAvailable() {
 void CXYWnd::Paste()
 {
 #if 1
-
 	CWaitCursor WaitCursor;
-	bool		bPasted = false;
 	UINT		nClipboard = ::RegisterClipboardFormat("RadiantClippings");
 	if (nClipboard > 0 && OpenClipboard() && ::IsClipboardFormatAvailable(nClipboard)) {
 		HANDLE	h = ::GetClipboardData(nClipboard);
@@ -4285,7 +4279,7 @@ void CXYWnd::Paste()
 
 		int		nLen = g_Clipboard.GetLength();
 		char	*pBuffer = new char[nLen + 1];
-		memset(pBuffer, 0, sizeof(*pBuffer) * (nLen + 1));
+		memset(pBuffer, 0, nLen + 1);
 		g_Clipboard.Read(pBuffer, nLen);
 		pBuffer[nLen] = '\0';
 		Map_ImportBuffer(pBuffer, !(GetAsyncKeyState(VK_SHIFT) & 0x8000));
@@ -4366,7 +4360,7 @@ idVec3 &CXYWnd::RotateOrigin() {
  =======================================================================================================================
  =======================================================================================================================
  */
-void CXYWnd::OnTimer(UINT nIDEvent) {
+void CXYWnd::OnTimer(UINT_PTR nIDEvent) {
 	if (nIDEvent == 100) {
 		int nDim1 = (m_nViewType == YZ) ? 1 : 0;
 		int nDim2 = (m_nViewType == XY) ? 1 : 2;
@@ -4505,8 +4499,7 @@ BOOL CXYWnd::OnMouseWheel(UINT nFlags, short zDelta, CPoint pt)
 void CXYWnd::DrawPrecisionCrosshair( void )
 {
 	// FIXME: m_mouseX, m_mouseY, m_axisHoriz, m_axisVert, etc... are never set
-	return;
-
+#if 0
 	idVec3 mouse3dPos (0.0f, 0.0f, 0.0f);
 	float x, y;
 	idVec4 crossEndColor (1.0f, 0.0f, 1.0f, 1.0f); // the RGBA color of the precision crosshair at its ends
@@ -4566,4 +4559,5 @@ void CXYWnd::DrawPrecisionCrosshair( void )
 	// Radiant was in opaque, flat-shaded mode by default; restore this to prevent possible slowdown
 	qglShadeModel( GL_FLAT );
 	qglDisable( GL_BLEND );
+#endif
 }
diff --git a/neo/tools/radiant/XYWnd.h b/neo/tools/radiant/XYWnd.h
index 94b36cb..b55bd79 100644
--- a/neo/tools/radiant/XYWnd.h
+++ b/neo/tools/radiant/XYWnd.h
@@ -242,7 +242,7 @@ protected:
 	afx_msg void OnSize(UINT nType, int cx, int cy);
 	afx_msg void OnDestroy();
 	afx_msg void OnSelectMouserotate();
-	afx_msg void OnTimer(UINT nIDEvent);
+	afx_msg void OnTimer(UINT_PTR nIDEvent);
 	afx_msg void OnKeyUp(UINT nChar, UINT nRepCnt, UINT nFlags);
 	afx_msg void OnNcCalcSize(BOOL bCalcValidRects, NCCALCSIZE_PARAMS FAR* lpncsp);
 	afx_msg void OnKillFocus(CWnd* pNewWnd);
diff --git a/neo/tools/radiant/splines.cpp b/neo/tools/radiant/splines.cpp
index 4c5b6d6..c67a615 100644
--- a/neo/tools/radiant/splines.cpp
+++ b/neo/tools/radiant/splines.cpp
@@ -403,7 +403,6 @@ idSplineList::buildSpline
 ================
 */
 void idSplineList::buildSpline() {
-	int start = Sys_Milliseconds();
 	clearSpline();
 	for(int i = 3; i < controlPoints.Num(); i++) {
 		for (float tension = 0.0f; tension < 1.001f; tension += granularity) {
@@ -419,7 +418,6 @@ void idSplineList::buildSpline() {
 		}
 	}
 	dirty = false;
-	//common->Printf("Spline build took %f seconds\n", (float)(Sys_Milliseconds() - start) / 1000);
 }
 
 /*
@@ -792,7 +790,6 @@ idCameraDef::addTarget
 ================
 */
 void idCameraDef::addTarget(const char *name, idCameraPosition::positionType type) {
-	const char *text = (name == NULL) ? va("target0%d", numTargets()+1) : name;
 	idCameraPosition *pos = newFromType(type);
 	if (pos) {
 		pos->setName(name);
@@ -1080,7 +1077,6 @@ idCameraDef::buildCamera
 
 void idCameraDef::buildCamera() {
 	int i;
-	int lastSwitch = 0;
 	idList<float> waits;
 	idList<int> targets;
 
@@ -1089,7 +1085,6 @@ void idCameraDef::buildCamera() {
 	// we have a base time layout for the path and the target path
 	// now we need to layer on any wait or speed changes
 	for (i = 0; i < events.Num(); i++) {
-		idCameraEvent *ev = events[i];
 		events[i]->setTriggered(false);
 		switch (events[i]->getType()) {
 			case idCameraEvent::EVENT_TARGET : {
@@ -1673,11 +1668,6 @@ idInterpolatedPosition::getPosition
 */
 const idVec3 *idInterpolatedPosition::getPosition( long t ) {
 	static idVec3 interpolatedPos;
-
-	if (t - startTime > 6000) {
-		int i = 0;
-	}
-
 	float velocity = getVelocity(t);
 	float timePassed = t - lastTime;
 	lastTime = t;
@@ -1685,10 +1675,6 @@ const idVec3 *idInterpolatedPosition::getPosition( long t ) {
 	// convert to seconds
 	timePassed /= 1000;
 
-	if (velocity != getBaseVelocity()) {
-		int i = 0;
-	}
-
 	float distToTravel = timePassed * velocity;
 
 	idVec3 temp = startPos;
@@ -2021,7 +2007,7 @@ const idVec3 *idSplinePosition::getPosition(long t) {
 
 		idVec3 v1 = *target.getSegmentPoint(i - 1);
 		idVec3 v2 = *target.getSegmentPoint(i);
-		double percent = (lastDistance2 - targetDistance) / (lastDistance2 - lastDistance1);
+		percent = (lastDistance2 - targetDistance) / (lastDistance2 - lastDistance1);
 		v2 *= (1.0f - percent);
 		v1 *= percent;
 		v2 += v1;
diff --git a/neo/tools/sound/DialogSound.cpp b/neo/tools/sound/DialogSound.cpp
index 6949c10..42b1cc6 100644
--- a/neo/tools/sound/DialogSound.cpp
+++ b/neo/tools/sound/DialogSound.cpp
@@ -635,7 +635,6 @@ void CDialogSound::SetWaveSize( const char *p ) {
 
 void CDialogSound::OnSelchangedTreeSounds(NMHDR* pNMHDR, LRESULT* pResult)
 {
-	NM_TREEVIEW* pNMTreeView = (NM_TREEVIEW*)pNMHDR;
 	HTREEITEM	item = treeSounds.GetSelectedItem();
 	SetWaveSize();
 	if (item) {
diff --git a/neo/ui/ChoiceWindow.cpp b/neo/ui/ChoiceWindow.cpp
index 5351466..0e34083 100644
--- a/neo/ui/ChoiceWindow.cpp
+++ b/neo/ui/ChoiceWindow.cpp
@@ -440,7 +440,9 @@ void idChoiceWindow::Draw(int time, float x, float y) {
 		color = hoverColor;
 	}
 
-	dc->DrawText( choices[currentChoice], textScale, textAlign, color, textRect, false, -1 );
+	if(choices.Num() > 0) {
+		dc->DrawText( choices[currentChoice], textScale, textAlign, color, textRect, false, -1 );
+	}
 }
 
 void idChoiceWindow::Activate( bool activate, idStr &act ) {
diff --git a/neo/ui/EditWindow.cpp b/neo/ui/EditWindow.cpp
index 35ae054..39106d8 100644
--- a/neo/ui/EditWindow.cpp
+++ b/neo/ui/EditWindow.cpp
@@ -210,7 +210,7 @@ const char *idEditWindow::HandleEvent(const sysEvent_t *event, bool *updateVisua
 	int len = text.Length();
 
 	if ( event->evType == SE_CHAR ) {
-		if ( event->evValue == Sys_GetConsoleKey( false ) || event->evValue == Sys_GetConsoleKey( true ) ) {
+		if ( event->evValue == Sys_GetConsoleKey( idKeyInput::IsDown( K_SHIFT ) ) ) {
 			return "";
 		}
 
diff --git a/neo/ui/ListWindow.cpp b/neo/ui/ListWindow.cpp
index f67f537..e62a53a 100644
--- a/neo/ui/ListWindow.cpp
+++ b/neo/ui/ListWindow.cpp
@@ -600,23 +600,33 @@ void idListWindow::UpdateList() {
 	}
 	float vert = GetMaxCharHeight();
 	int fit = textRect.h / vert;
+	int selection = gui->State().GetInt( va( "%s_sel_0", listName.c_str() ) );
 	if ( listItems.Num() < fit ) {
 		scroller->SetRange(0.0f, 0.0f, 1.0f);
+		top = 0;
+		scroller->SetValue(0.0f);
 	} else {
 		scroller->SetRange(0.0f, (listItems.Num() - fit) + 1.0f, 1.0f);
-	}
 
-	SetCurrentSel( gui->State().GetInt( va( "%s_sel_0", listName.c_str() ) ) );
+		// DG: scroll to selected item
+		float value = scroller->GetValue();
+		if ( value < 0.0f ) {
+			value = 0.0f;
+			top = 0;
+		} else if ( value > listItems.Num() - 1 ) {
+			value = listItems.Num() - 1;
+		}
+		float maxVisibleVal = Min(value + fit, scroller->GetHigh());
+		if ( selection >= 0 && (selection < value || selection > maxVisibleVal) ) {
+			// if selected entry is not currently visible, center it (if possible)
+			value = Max(0.0f, selection - 0.5f * fit);
+		}
 
-	float value = scroller->GetValue();
-	if ( value > listItems.Num() - 1 ) {
-		value = listItems.Num() - 1;
+		scroller->SetValue(value);
+		top = value;
 	}
-	if ( value < 0.0f ) {
-		value = 0.0f;
-	}
-	scroller->SetValue(value);
-	top = value;
+
+	SetCurrentSel( selection );
 
 	typedTime = 0;
 	clickTime = 0;
diff --git a/neo/ui/UserInterface.cpp b/neo/ui/UserInterface.cpp
index 2937de7..032b3c4 100644
--- a/neo/ui/UserInterface.cpp
+++ b/neo/ui/UserInterface.cpp
@@ -344,7 +344,7 @@ const char *idUserInterfaceLocal::HandleEvent( const sysEvent_t *event, int _tim
 		return ret;
 	}
 
-	if ( event->evType == SE_MOUSE ) {
+	if ( event->evType == SE_MOUSE || event->evType == SE_MOUSE_ABS ) {
 		if ( !desktop || (desktop->GetFlags() & WIN_MENUGUI) ) {
 			// DG: this is a fullscreen GUI, scale the mousedelta added to cursorX/Y
 			//     by 640/w, because the GUI pretends that everything is 640x480
@@ -374,8 +374,20 @@ const char *idUserInterfaceLocal::HandleEvent( const sysEvent_t *event, int _tim
 				}
 			}
 
-			cursorX += event->evValue * (float(VIRTUAL_WIDTH)/w);
-			cursorY += event->evValue2 * (float(VIRTUAL_HEIGHT)/h);
+			if( event->evType == SE_MOUSE ) {
+				cursorX += event->evValue * (float(VIRTUAL_WIDTH)/w);
+				cursorY += event->evValue2 * (float(VIRTUAL_HEIGHT)/h);
+			} else { // SE_MOUSE_ABS
+				// Note: In case of scaling to 4:3, w and h are already scaled down
+				//       to the 4:3 size that fits into the real resolution.
+				//       Otherwise xOffset/yOffset will just be 0
+				float xOffset = (renderSystem->GetScreenWidth()  - w) * 0.5f;
+				float yOffset = (renderSystem->GetScreenHeight() - h) * 0.5f;
+				// offset the mouse coordinates into 4:3 area and scale down to 640x480
+				// yes, result could be negative, doesn't matter, code below checks that anyway
+				cursorX = (event->evValue  - xOffset) * (float(VIRTUAL_WIDTH)/w);
+				cursorY = (event->evValue2 - yOffset) * (float(VIRTUAL_HEIGHT)/h);
+			}
 		} else {
 			// not a fullscreen GUI but some ingame thing - no scaling needed
 			cursorX += event->evValue;
diff --git a/neo/ui/Window.cpp b/neo/ui/Window.cpp
index 2bdf96f..3948cea 100644
--- a/neo/ui/Window.cpp
+++ b/neo/ui/Window.cpp
@@ -927,7 +927,7 @@ const char *idWindow::HandleEvent(const sysEvent_t *event, bool *updateVisuals)
 				}
 			}
 
-		} else if (event->evType == SE_MOUSE) {
+		} else if (event->evType == SE_MOUSE || event->evType == SE_MOUSE_ABS) {
 			if (updateVisuals) {
 				*updateVisuals = true;
 			}