2 # Captive project doc Details page Perl template.
3 # Copyright (C) 2003-2005 Jan Kratochvil <project-www.jankratochvil.net@jankratochvil.net>
5 # This program is free software; you can redistribute it and/or modify
6 # it under the terms of the GNU General Public License as published by
7 # the Free Software Foundation; exactly version 2 of June 1991 is required
9 # This program is distributed in the hope that it will be useful,
10 # but WITHOUT ANY WARRANTY; without even the implied warranty of
11 # MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
12 # GNU General Public License for more details.
14 # You should have received a copy of the GNU General Public License
15 # along with this program; if not, write to the Free Software
16 # Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
19 package project::captive::doc::Details;
20 require 5.6.0; # at least 'use warnings;' but we need some 5.6.0+ modules anyway
21 our $VERSION=do { my @r=(q$Revision$=~/\d+/g); sprintf "%d.".("%03d"x$#r),@r; };
31 BEGIN { Wuse 'project::captive::doc::Macros'; }
32 project::captive::doc::Macros->init(
33 "__PACKAGE__"=>__PACKAGE__,
34 "title"=>'Captive NTFS Developer Documentation: Implementation Details',
35 "rel_prev"=>'CacheManager.pm',
36 "rel_next"=>'APITypes.pm',
43 <h1>Implementation Details</h1>
45 <h2 id="emulmeth">Choice of the Emulation Methods</h2>
47 <p>The intent of the project was to get reliable read-write access to
48 <span class="productname">NTFS</span> partition. There are several possible
49 ways to achieve that:</p>
51 <h3 id="emulmeth_vm">Virtualmachine Running the Original W32 Subsystem</h3>
53 <p>Creating virtual-hardware PC and running the original W32 binaries
54 including their boot-loader etc. Disk device access would be passed as
55 virtual IDE disk (=hard disk drive). File access API would be implemented
56 either by special escaping by some trapped instruction out of the
57 virtualmachine while using W32 file access API or using the standard W32
58 SMB (Server Message Block) network access through some virtual network
59 card. The latter network access solution is almost the currently available
60 possibility of running full-blown disk-sharing real
61 <span class="productname">Microsoft Windows NT</span> inside virtual
62 machine emulator such as <span class="productname">VMware</span>.</p>
64 <p>pros: Full compatibility due to fully native codebase.</p>
66 <p>cons: Hard to debug, missing documentation of NT booting internals,
67 possible problems by different PC virtual-hardware than expected by NT,
68 requirement of fully installed
69 <span class="productname">Microsoft Windows NT</span> product.</p>
71 <h3 id="method_ntoskrnl">"ntoskrnl.exe" Inside Virtual Address Space</h3>
73 <p>This solution was chosen by the project. Binary filesystem driver and
74 also <span class="fname">ntoskrnl.exe</span> binary file are required.
75 Unfortunately <span class="fname">ntoskrnl.exe</span> expects a native
76 PC virtual-hardware missing during regular UNIX user space process
77 emulation, therefore such instructions must be trapped and emulated/ignored
78 from case to case.</p>
80 <p>Also the <span id="init_ntoskrnl">initialization code of <span
81 class="fname">ntoskrnl.exe</span></span> is not executed by this project since
82 it expects to get full PC hardware access privileges and thus some
83 datastructures do not get initialized by it (need to be trapped later at
84 runtime stage). Some of the missing initializations are solved by
85 @{[ a_href 'APITypes.pm#functype_wrap','API functions wrapping' ]}.</p>
87 <p>pros: Lightweight, easier to debug.</p>
89 <p>cons: Possible incompatible emulation of
90 <span class="fname">ntoskrnl.exe</span> parts, missing documentation needed
91 for the implementation.</p>
93 <h3 id="emulmeth_fs">Filesystem Driver Inside Virtual Address Space</h3>
95 <p>Unlike @{[ a_href 'Details.pm#method_ntoskrnl','previous method' ]} here we do not use
96 even <span class="fname">ntoskrnl.exe</span> as the complete kernel part of
97 W32 is <span id="native_ntoskrnl">emulated from the project source
98 files</span>. <span class="fname">cdfs.sys</span> driver was successfuly ran
99 in this manner in the former versions of this project but the possibility
100 to run without <span class="fname">ntoskrnl.exe</span> was dropped since it
101 had no licensing gains (you need the original
102 <span class="productname">Microsoft Windows NT</span> files at least for
103 the filesystem driver itself) and the emulation of undocumented parts
104 reusable from <span class="fname">ntoskrnl.exe</span> binary was
107 <p>pros: Lightweight, easier to debug.</p>
109 <p>cons: Possible incompatible emulation of the whole
110 <span class="fname">ntoskrnl.exe</span>, its missing documentation.</p>
113 <h2 id="apichoice">API Function Implementation Choices</h2>
115 <p>During the initial point of the project development all the API
116 functions were defined as unimplemented, of course. Any call of such
117 unimplemented function is fatal and results in program termination. When we
118 need to implement any required API function we have multiple choices to do
120 @{[ a_href 'APITypes.pm#functype_pass','Direct pass to original <span class="fname">ntoskrnl.exe</span>' ]},
121 @{[ a_href 'APITypes.pm#functype_wrap','Wrap of the original <span class="fname">ntoskrnl.exe</span> function' ]},
122 @{[ a_href 'APITypes.pm#functype_native_reactos','Native implementation – $ReactOS' ]},
123 @{[ a_href 'APITypes.pm#functype_native_wine','Native implementation – $Wine' ]}
125 @{[ a_href 'APITypes.pm#functype_native_libcaptive','Native implementation – project specific' ]}.
126 <!-- a_href 'APITypes.pm#functype_undef','Undefined function' -->
130 <h2 id="sandbox">Sandboxing of W32 Filesystem</h2>
132 <p>The emulated W32 environment running the original W32 filesystem driver
133 is separated from the rest of UNIX OS. It achieves the following goals:</p>
136 <li><b>Restartable</b>: W32 driver can be restartde in clean state if it crashed</li>
137 <li><b>Secure</b>: Malicious W32 code cannot affect the security of UNIX OS</li>
138 <li><b>Stable</b>: Buggy W32 cannot crash any part of UNIX OS</li>
141 <p>Sandboxing is provided with the following attributes:</p>
144 <li>standalone UNIX process with separate memory space</li>
145 <li>chroot(2) in empty directory to prevent any UNIX OS filesystem access</li>
146 <li>setuid(2) to own user/group to prevent interaction with UNIX processes</li>
147 <li>setrlimit(2) to limit system resources available for W32 environment</li>
148 <li>the only connection with the UNIX OS by CORBA/ORBit RPC</li>
151 @{[ doc_img 'dia/arch-all','Project Components Architecture' ]}
153 <p>This security is almost the same as provided by
154 emulated virtual machines such as
155 @{[ a_href 'http://www.vmware.com/solutions/security.html','VMware' ]}.</p>
157 @{[ doc_img 'dia/inheritance','Sandboxing Scheme' ]}
159 <p>Project can be also used in non-sandboxed mode by
160 <span class="command">--no-sandbox</span> option as it is easier to debug
161 without CORBA/ORBit RPC. In this case the
162 <span class="type">DirectorySlave</span>/<span class="type">FileSlave</span>
163 options are used directly instead of their
164 <span class="type">DirectoryParent</span>/<span class="type">FileParent</span>
168 <h2 id="patched">"patched" vs. "unpatched" Libraries</h2>
170 <p>Library is called <span class="constant">patched</span> if we require
171 loading its original binary code file. Project needs to patch it to be able
172 to trap all the function entry points. The only currently
173 <span class="constant">patched</span> library of this project is
174 <span class="fname">ntoskrnl.exe</span>.</p>
176 <p>Library is called <span class="constant">unpatched</span> if no original
177 binary code is needed since all of its functions are completely emulated by
178 @{[ a_href 'APITypes.pm#functype_native','the native implementations' ]} of this project.
179 The typical <span class="constant">unpatched</span> representative is
180 <span class="fname">hal.dll</span> as it specializes on the hardware
181 dependent code and therefore it must be completely replaced by this project
182 running in the $gnulinux operating system environment. Early versions of
183 this project had also full <span class="constant">unpatched</span>
184 <a href="#native_ntoskrnl">native implementation of
185 <span class="fname">ntoskrnl.exe</span></a> but it no longer applies.</p>
187 <h2 id="mman">Memory Management</h2>
189 <p>Original <span class="productname">Microsoft Windows NT</span>
190 architecture uses two address space areas – user space and kernel space.
191 User space is mapped in the range <span class="constant">0x00000000</span>
192 to <span class="constant">0x7FFFFFFF</span>, kernel space is mapped in the
193 range <span class="constant">0x80000000</span>
194 (<span class="constant">KERNEL_BASE</span> in $ReactOS sources) to
195 <span class="constant">0xFFFFFFFF</span>. All these virtual memory ranges
196 represent addresses after their MMU (Memory Management Unit) mapping, of
197 course. More discussion can be found in the
198 <a href="http://www.microsoft.com/hwdev/platform/server/PAE/PAEmem.asp">description
199 by <span class="productname">Microsoft</span></a>.</p>
201 <p>This project runs in the virtual address space used both for the UNIX
202 user space process part and for the W32 kernel space. Therefore this
203 project defines that W32 kernel runs in the whole range
204 <span class="constant">0x00000000</span> to
205 <span class="constant">0xFFFFFFFF</span> since there are no special mapping
206 assumptions about the UNIX user space process mapping. No W32 user space
207 exists in this project. Such approach also nullifies any special memory
208 moving operations between W32 kernel space and W32 user space memory areas
209 (such as <span class="function">MmSafeCopyToUser()</span>).</p>
211 <h2 id="unicode">Unicode Strings and Characters</h2>
213 <p>W32 platform uses 16-bit type <span class="type">wchar_t</span> while $gnulinux uses a
214 32-bit one. This can be problem during GCC (GNU C Compiler)
215 compilation of combination of native UNIX C sources (assuming 32-bit
216 GCC with 32-bit <span class="type">wchar_t</span>) and
217 $ReactOS C sources (assuming W32 compiler with 16-bit
218 <span class="type">wchar_t</span>) for literal wide strings
219 (C source file systax: <span class="command">L"wstring"</span>).
220 Possibilities to solve this issue list:</p>
224 <p>Using <span class="constant">-fshort-wchar</span> GCC option and
225 strictly differentiate between compilation of
226 <span class="productname">ReactOS</span> code and UNIX code.</p>
228 <p>pros: No source modifications needed, no runtime performance hit.</p>
230 <p>cons: No type checking if some part of code has bad compilation
231 flags, complicated way to completely split
232 <span class="productname">ReactOS</span> and UNIX code.</p>
235 <p>Wrap all <span class="productname">ReactOS</span> literal constants
236 by some conversions function call (implemented as macro
237 <span class="function">REACTOS_UCS2()</span> by this project).</p>
239 <p>pros: Any forgotten/mistaken conversions are type-checked and warned
240 during the compilation by GCC.</p>
242 <p>cons: All compiled <span class="productname">ReactOS</span> sources
243 files containing literal wide strings have to be wrapped/modified,
244 performance hit by runtime string conversions.</p>
246 <p>This solution was chosen to get the internal sanity checking
251 <h2 id="binfmt">Supported Binary Formats</h2>
253 <p>The native W32 binary format is identified as
254 <span class="constant">PE-32</span> (Portable Executable 32-bit), such
255 files have all the usual extensions such as
256 <span class="fname">.sys</span>, <span class="fname">.exe</span>,
257 <span class="fname">.dll</span> etc. <span class="constant">PE-32</span>
258 loading support was already implemented by $ReactOS, its memory mapping
259 specifics just had to be ported to $gnulinux environment by this project.
260 This loading support does not (yet) cover importing of debug symbols from
261 W32 <span class="fname">.PDB</span> (Program DataBase) files in $gnulinux
262 ABI (Application Binary Interface) compatible way.</p>
264 <p>This project also supports transparent loading of UNIX
265 <span class="fname">.so</span> (Shared Object file) binary format. If you
266 have W32 source files for some W32 library you can try to compile it by GCC
267 to get the shared library with $gnulinux ABI compatible debug information
268 (GCC option <span class="constant">-ggdb3</span> recommended). Beware of
269 possible compilation problems as <span class="productname">Microsoft</span>
270 C code expects <span class="constant">exception</span> handling to be
271 supported by the compiler (definitely not the case of the plain C compiler
272 of GCC) — all the exception catching code should be discarded as any
273 @{[ a_href '#exception_fatal','generated exceptions are always fatal' ]} when
274 such driver is running in the scope of this project. You can use the
275 following script of this project to compile W32 filesystem source files as
276 UNIX <span class="fname">.so</span>:
277 @{[ captive_srcfile 'src/w32-mod/ext2fsd.so-build.sh' ]}</p>
279 <p>Be aware of some differences if you use
280 <span class="constant">PE-32</span> binary format file vs.
281 <span class="fname">.so</span> format file.
282 <span class="constant">PE-32</span> use the appropriate W32 specific
283 @{[ a_href '#calltype','cdecl/stdcall/fastcall call types' ]},
284 <span class="fname">.so</span> must be completely compiled in the standard
285 UNIX @{[ a_href 'CallType.pm#calltype_cdecl','cdecl call type semantics' ]}.
286 @{[ a_href 'APITypes.pm#functype_native','Native function implementations' ]} do not need
287 to be explicitely exported by <span class="fname">captivesym</span> as they
288 are resolved automatically by the UNIX dynamic system linker. It may be
289 surprising you will have to fix all such missing symbol exports if you
290 advance during the development from the debugging
291 <span class="fname">.so</span> file for the production version of the
292 original <span class="constant">PE-32</span> binary file.</p>
295 <h2 id="mounted_one">At Most One Mounted Filesystem</h2>
297 <p>The project technically supports only one (exactly one...) mounted
298 filesystem device and only one filesystem driver. There is nothing
299 complicated to support multiple disks and multiple loaded filesystem
300 modules but as they would share the address space it would only bring
301 a possible complications during bug reports and the bug solving
302 itself. It was considered as a more sane way to support multiple W32
303 mounted disks by completely separately running project instances in
304 a different UNIX processes communicating from their sandboxes via
305 @{[ a_href 'Details.pm#sandbox','CORBA sandbox interface' ]}. This sandboxing
306 feature is not yet deployed although its code is already prepared.</p>
308 <p>The project also does not support any state cleanup to be able to load
309 filesystem <span class="constant">A</span>,
310 cleanup <span class="constant">A</span> and load a different
311 filesystem <span class="constant">B</span> in the same process address
312 space. It complies with the preventions of the possible debugging
313 complications as noted above. Despite this you still must call the function
314 <span class="function">captive_shutdown()</span> to flush all the pending
315 filesystem buffers to the disk. After calling
316 <span class="function">captive_shutdown()</span> the process address space is
317 no longer usable for any further project operations and the process is
318 expected to be terminated in the manner compatible with its driving
319 @{[ a_href 'Details.pm#sandbox','CORBA sandbox interface' ]} control master.</p>
321 <p>Each sandbox executing the untrusted W32 binary filesystem driver code
322 is connected through its
323 @{[ a_href 'Details.pm#sandbox','CORBA sandbox interface' ]} at the point of upper
324 layer <span class="constant">libcaptive</span>-specific filesystem API, at
325 the point of the bottom layer of <span class="type">GIOChannel</span>
326 device access and also for transfers of GLib logging
327 messages/warnings/errors out of the sandbox to the user.</p>
330 <h2 id="synchronous">Multithreading and Multiple Processors</h2>
332 <p>W32 platform stands on its thorough architecture parallelism. It
333 must lock all its objects to maintain coherence in presence of
334 multithreading and multiple processors. Since the author of this project
335 considers any parallel execution a serious obstacle for debugging the whole
336 project architecture was designed to prevent any undeterministic behaviour.
337 Therefore this projects always emulates uniprocessor
338 <span class="productname">Microsoft Windows NT</span> kernel
339 (<span class="constant">KeNumberProcessors</span> symbol is always 1),
340 everything runs in the single initial thread/process and all the filesystem
341 operations are performed as synchronous
342 ("synchronous" by flags
343 <span class="constant">FILE_SYNCHRONOUS_IO_ALERT</span>,
344 <span class="constant">FO_SYNCHRONOUS_IO</span>,
345 <span class="constant">IRP_SYNCHRONOUS_API</span>,
346 <span class="constant">IRP_SYNCHRONOUS_PAGING_IO</span>,
347 forced <span class="constant">TRUE</span> result of
348 <span class="function">IoIsOperationSynchronous()</span>
350 For several cases needed only by <span class="fname">ntfs.sys</span> there
351 had to be supported asynchronous access
352 (<span class="constant">STATUS_PENDING</span> return code) – parallel
353 execution is emulated by GLib
354 <span class="function">g_idle_add_full()</span> with
355 <span class="function">g_main_context_iteration()</span> called during
356 <span class="function">KeWaitForSingleObject()</span>.</p>
357 <p>Since there is a possibility a real W32 parallel threading would
358 be yet needed in the future all the code that would be hit by W32
359 multithreading capability is marked by
360 <span class="constant">TODO:thread</span> comment.</p>
362 <p>Multiple processors (SMP) support will never need to be implemented
363 since uniprocessor W32 kernels apparently run the filesystem driver modules
364 fine. As this project implements only the uniprocessor W32 kernel all the
365 processor locking functions and structures such as
366 <span class="constant">KSPIN_LOCK</span> etc. can be safely implemented as
369 <p>Asynchronous callbacks registered for
370 <span class="constant">IO_WORKITEM</span>s are passed as GLib idle
371 functions by <span class="function">g_idle_add_full()</span>. Although they
372 will probably never be executed during non-interactive project's batch
373 executions it is the responsibility of W32 driver implementation to
374 complete all the pending tasks before its W32 shutdown. Such W32 shutdown
375 is done during cleanup of the project's execution by
376 <span class="function">captive_shutdown()</span>.</p>
378 <h2 id="paranoia">Paranoia Checks</h2>
380 <p>A general approach of software projects development is to implement
381 many internal sanity checks during the development stage but to produce the
382 most optimized final release product without those debugging checks.</p>
384 <p>Facilities for these practices can be seen in the standard
385 C include files for example as function
386 <span class="function">assert()</span> which gets disabled by the
387 <span class="constant">NDEBUG</span> symbol used during the final optimized
388 executable compilation. This project uses Gnome GLib messaging subsystem
389 offering sanity checks discarded by symbols
390 <span class="constant">G_DISABLE_ASSERT</span> and
391 <span class="constant">G_DISABLE_CHECKS</span>.
392 <span class="productname">Microsoft</span> also produces two versions of
393 its products – regular customers use the "free build" (also
394 called "retail") while the programmers should develop their code
395 on the "checked build" product releases.</p>
397 <p>As this project will always run unknown binary code of proprietary W32
398 filesystem drivers, the code can never be trusted. Such code even runs in
399 the same unprotected address space as its controlling UNIX code. Since
400 there is not enough documentation for the W32 components of the system and
401 also such documentation is usually misleading it can never be considered as
402 100% emulation. Even in the final releases all the sanity checks
403 implemented in this project should remain active as all the project's code
404 always interacts with unknown and untrusted W32 binaries.</p>
406 <p><span class="productname">Microsoft Windows NT</span> code is written in
407 a foolproof style as it accepts even invalid input values, and which
408 it usually corrects. This makes long-term debugging a pain as it hides
409 sources of problems. "Checked build" releases were probably
410 designed to fix this flaw by strict consistency checks but it did not reach
411 its goals as such checks are usually missing in the code.</p>
413 <p>This project has strict consistency checks across all the code to make
414 the debugging phase easy enough. Failed sanity check is not always
415 a bug – sometimes it just means the real W32 binary code is more
416 benevolent than it could be expected according to the documentation and
417 such sanity check gets removed for the next version build. In other cases
418 the failed sanity checks mean the execution path for some unexpected
419 arguments combination was not yet implemented by this project. I may also
420 mean a bug, of course...</p>
422 <p>Last but not least – never miss a possible sanity check as its
423 later removal is in an order of magnitude cheaper than an uncaught
424 invalid assumption. Failed assertion is not always a bug although it
425 has to be fixed, of course.</p>
428 <h2 id="logfile">STATUS_LOG_FILE_FULL</h2>
430 <p>After writing approx. 1MB of data on NTFS test partition NTFS driver
431 returns for any further write requests
432 <span class="constant">STATUS_LOG_FILE_FULL</span> error code.
433 Apparently it is caused by the fact this project is
434 @{[ a_href 'Details.pm#synchronous','single-threaded' ]} and it ignores the spawn
435 of parallel journalling thread during <span class="fname">ntfs.sys</span>
438 <p>Fortunately <span class="fname">ntfs.sys</span> will clear its
439 journalling log file during filesystem unmount. This project will therefore
440 remount the volume if <span class="constant">STATUS_LOG_FILE_FULL</span>
441 is detected to workaround missing journalling thread.</p>
443 <p>Similiar behaviour can be seen during write of compressed files —
444 the file gets written uncompressed and its compression will proceed only
445 during the final filesystem unmount.</p>
447 <p>For these reasons it was mandatory to support
448 @{[ a_href 'Details.pm#parent_connector','transparent volume remounting' ]}.</p>
451 <h2 id="parent_connector"><span class="constant">ParentConnector</span> volume remounter</h2>
453 <p>The sandbox master component of this project has control of restarting
454 its sandbox slaves containing the W32 filesystem. Target goal of
455 <span class="constant">ParentConnector</span> component is to transparently
456 provide persistent view of files and directories over the sandboxed slaves
459 <p>In the case of read-only operations it would be simple as we could only
460 save our state of currently opened filesystem objects with their read
461 file/directory offset. Write operations can be handled as the read-only
462 ones as long as all the operations are successful. In the case of W32
463 filesystem crash we loose all the past write operations. If we would redo
464 all the write operations we could very easily invoke the same crash.
465 Therefore we write:</p>
467 <blockquote class="command">
468 <p>Filesystem crash broke dirty object: FILE/PATH/NAME</p>
471 <p>message to syslog and refuse any further operations with this
474 @{[ doc_img 'dia/parent-connector','Parent Connector' ]}
476 <p><span class="constant">HANDLE</span> represents W32 object open in
477 existing W32 filesystem.<span class="constant">HANDLE</span> is created
478 on-demand according to the saved state of the object (such as its
479 pathname). Even the whole <span class="constant">VFS</span> sandbox slave
480 is spawn on-demand if some object operation requests it.</p>
482 <p>W32 filesystem crash can obviously occur at any moment - it generates
483 @{[ a_href 'http://developer.gnome.org/doc/API/2.0/gobject/','GObject' ]}
484 @{[ a_href 'http://developer.gnome.org/doc/API/2.0/gobject/gobject-Signals.html','signal' ]}
485 <span class="constant">abort</span>. Successful filesystem unmount
486 (even as the part of remount operation) must be first preceded by
487 <span class="constant">detach</span> signal to close all existing
488 W32 <span class="constant">HANDLE</span>s. After their close the filesystem
489 gets the unmount requests. Only in the case all the close operations
490 succeeded including the final filesystem unmount the signal
491 <span class="constant">cease</span> can be activated to notify all the
492 dirty (written) objects they are now clean. During this
493 <span class="constant">cease</span> signal the project will also
494 @{[ a_href '#safe_flush','flush' ]} the sandbox commit buffer to its
495 underlying media.</p>
497 <p>Objects never written remain in <span class="constant">clean</span>
498 state and they can be transparently reopened even if W32 filesystem crash