i86 Projects

back

ChaOS is an organic bootstrapped self-compiling operating system for Intel/AMD platforms, in continuous development since 1995. It is not a linux derivative. The compiler, linker and source editor which produce the executable image are embedded within the operating system, along with the entire source code and an integral source-level debugger. This makes for a compact development platform which has successfully migrated from 80486 to present-day Intel x64 platforms.

ChaOS is predominantly a text-based system but runs in a choosable VESA graphics mode, so has graphics capability which is used for design software outputting to a CO2 laser. It serves a valuable purpose in preserving custom business software, unaffected by the version changes which plague the mainstream operating system world. ChaOS has evolved to be simplistic yet reliable, and by being different, is completely immune to internet viruses.

6.7.23 - ChaOS over UEFI Revisited bootx64, ox64 and bx64 development projects from 2014. After some bug fixes, all compile OK. bootx64 won't compile with cc9 as a 64-bit fully relocatable image, not sure why right now, but this is not a problem as UEFI will launch this program at an address below 4Gb. bootx64 will launch any project compiled with cc9 to any address in 64-bit flat space. The usual 2Gb application size applies due to the use of 32-bit signed offsets for near jumps and calls.

There is a strong analogy between 32-bit ChaOS with the new 64-bit app launch, and bootx64 over UEFI with the cc9-compiled fully relocatable application. Therefore, as a project I will try to port my first 64-bit ChaOS app m64 (64-bit memory browser) to run as a UEFI app launched by bootx64.

2.7.23 - cc8 default function type Changed {cc8} (ChaOS 64-bit compiler) default function type to call64. This removes the requirement for the call64 function modifier to be used to force {cc8} to compile 64-bit code. 64-bit libraries can now be created simply by passing the 32-bit library code through {cc8}. Also, the default 32-bit ChaOS compiler {cc} will invoke {cc8} for any module in the .link file preceded by a '#'. This streamlines the 64-bit development cycle.

ChaOS has long had a 64-bit pointer type, created by using '#' in the declaration instead of '*'. {cc8} has a command-line flag '/#' to switch 64-bit pointers on, otherwise they are loaded as 32-bit values. This default behaviour means 32-bit source code usually compiles and runs in 64-bit mode no problem (mainly avoiding garbage being drawn into the upper 32 bits of a register used for indirect memory access). Since I defaulted the cc8 stack model to 64-bits, and added boiler-plate code to blank the top bits of 64-bit types when assigned from smaller integers, 64-bit pointers have become stable. As a further tweak to the 64-bit development loop, any module in the .link file preceded by '##' is compiled by invoking cc8 /#;

1.7.23 - ChaOS 64-bit applications Another ad-hoc idea to simplify 64-bit development, added code to recognise applications compiled with SL far call62 main() entry point. If found, entry is via a far call to a 64-bit code segment, thus entering the application in native 64-bit mode. With a few tweaks to pass argc and argv as 64-bit values, 16-byte stack alignment, this greatly simplifies the 64-bit development cycle without departing the stable 32-bit ChaOS host.

25.6.23 - Curl Relay Server ChaOS has no https: nework stack, and adding one is a major undertaking. To work around this limitation, created an Ubuntu Linux-hosted server daemon listening on a UDP port to accept instructions for building a Curl session. This daemon is liberally based on TFTP, so is capable of posting data to, or receiving data from remote network hosts.

In quick succession, added the facility to manage smtps:// sessions. Coupled with some modifications to njob, pdf invoices and statements can now be attached to a simple email template and posted to an smtps:// server.

27.12.22 - GigaNumbers Added GF type, floating point GigaNumber. Twos-complement arithmetic avoided by pre-processing, such that operands are never subjected to subtraction from a smaller significand. Added native debug watch support for GN and GF types, to make these GigaNumbers easier to visualise and debug..

Created gFPU module, for linking to any process. Provides apps with a private GigaNumber FPU stack. Library functions are x87-style such as fld, fmulp, fstp etc, intuitive for anyone who has programmed for an Intel floating point unit. Experimenting with 4-million-bit floating point registers.

Created strtogf function, to scan improbably long digit strings (and skipping newlines and line-feed characters if required) into GF format, expanding the significand automatically to accomodate all digits presented.

Calculated cube root of 2 with 4 million bits of precision, Newton-Raphson iterated 25 times. Should be sufficient for to 1 million decimal places. Largest reference I could find for this number was 20000 digits, and results match Simon Plouffe perfectly, now working on verification of my results.

30.11.22 - GigaNumbers Implemented variable-sized large number library with decimal exponent/right-justified significand or binary exponent/left-justified significand. Significand sizes from 64 bits upwards in 32-bit steps. Added functions to convert digit strings into the new formats, auto-expanding the significand if necessary, and functions to convert back to digit strings for display. Added specific debug watch support, defaulting to 80 fractional digits.

Much easier now to develop code to deal with numbers larger than 64 bits (which are in all cases less than 20 decimal digits).

14.10.22 - Debug watch Finally got around to implementing debug watch. Up to 20 memory locations can be monitored simultaneously with quick delete, undelete, duplicate on single keystrokes. Each watched item is strongly typed to drive a context display which expands structures, arrays etc, according to the typidx in play. Added 128-bit visibility map to context display, so structure members can be hidden/unhidden using DEL/INS keys. Provides a really quick way weed out the wheat from the chaff when monitoring large structures, and avoids window-scrolling.

7.10.22 - Crypto library Beginning to pull together various projects into a crypto library. SHA256/SHA512 up and running, along with Drepper wrappers to produce $5$ and $6$ Linux password hash strings.

10.8.22 - WND Introduced app WND to ChaOS process, with boiler-plate code to clean up WNDs for processes normally or abnormally terminated.

21.1.22 - AP invoke Issues arising from AP executing operating system invoke, particularly process creation and destruction including early exit via kill function, now stabilised. Full operating system compile and link works fine when run on AP, whilst leaving CPU0 idling in getmsg() to ensure there is no contention for filesystem access.

Compiling two projects simultaneously on different CPUs may now be possible with some well-placed spinlocks to serialize access to uploadfile and downloadfile.

9.1.22 - AP invoke Added new AP launch command mxp to start AP with a command line, then execute through command parsing, internal and external command search and upload of executable file from disk as required. Works first time, but is of course not thread-safe; however it provides a rich set of commands for exploring multithreading, and determining the best places to insert mutex locks to prevent conflicts between CPUs.

First attempt at such a mutex on invoke itself is a non-starter because invoke calls itself recursively during the processing of batch command files (shell scripts). When stuck in such a spinlock, Break key enters debug_kernel inside the spinlock loop so it's easy to see what is going on. As I said, great for exploring where to place spinlocks!

8.1.22 - 16-byte stack alignment Further tweaked ccx to adjust stack pointer before pushing function call arguments to provide called function with 16-byte aligned stack; also tweaked local variable allocator to round up all local allocations to 16-byte multiples. Tweaked ChaOS exec to provide 16-byte alignment on entry to main function of new process. These quick fixes provide a platform for working with SIMD types as local variables without SIMD alignment faults.

4.1.22 - SSE and AVX Revisiting Intel CPU SIMD extensions with an expanded assembly language opcode set for SSE which I wrote about 8 years ago but which was never fully grafted into the stable ChaOS compiler. For this I am using a tweaked version of ChaOS compiler (ccx) with a larger opcode database.

Also expanding disassembler to explore VEX instruction encodings.

30.12.21 - rmdgb Improved real-mode debugger to properly support source-line stepping through the bootstrap. Added alternate real-mode code blob for PMRM and RMPM switches and stacks, independent of those within the bootstrap itself. This allows rmdbg to continue running when the bootstrap sectors are loaded. Debug registers DR0-DR7 now used to implement goto function, which protects against code being loaded up from disk over the top of the area being debugged. I can now step through all of the partition table executable, boot sector executable and bootstrap itself (except for critical areas such as CPU mode transitions).

27.12.21 - mutex on debug_kernel Running several CPUs whilst sharing the same IDT reveals lots of limitations in my debugger, especially around breakpointing and source line stepping. CPUs entering debug_kernel can be left looping for a keystroke with no keyboard focus.

As an exercise, I have added the mutex modifier to debug_kernel and the user interface now switches between CPUs which are competing for the debugger in a sane and orderly manner. I notice that when one CPU core is throttled back, APs on the other core are prioritised through the mutex spinlocks. This effect worsens if APs compete through two sequential locks. An AP throttled to 1/8 speed competing with a full-speed AP through two locks gets through both locks only 1/64 of the time. Therefore the order in which CPUs gain access to mutex debug_kernel is heavily skewed towards faster-clocking APs; nevertheless this order provides insight into what happens in real time when all APs are running free..

26.12.21 - thread-safe function sets Quickly building on new spinlock strategy, added extension to the ChaOS compiler to scan for {global-symbol} after mutex function modifier to allow programmer to specify an explicit mutex lock, rather than have the compiler create an implicit lock in the codestream. By specifying the same lock variable, (checked to make sure it is a simple global dword type) a suite of functions can share the same lock. Thus if any CPU enters any function in the suite, all the functions become locked until that CPU exits that function. This is pretty much how spinlocks are used in the real world, but I can now lean on the compiler to ensure that locking and unlocking is done at the right time. The syntax is nice and simple too:

    UL  heaplock;                       //declare a global lock variable
    VD* mutex {heaplock} malloc(UL size){...function-body...}
    VD  mutex {heaplock} free(VD* ptr)  {...function-body...}

Both the implicit and explicit locks use the same mechanism, i.e. lea eax,&lock;call __mutex;, placed at the entry to a mutex function. So it is easy to examine the code stream, find the lock address, and clear any locks held by a given CPU, an essential part of the killAP mechanism.

My mutex function modifier is now working as I had hoped when I added this tweak to the ChaOS compiler in 2011. Looking forward to crash-testing with multiple CPUs contending for hardware resources!

26.12.21 - thread-safe heap allocation Whilst the mutex modifier on malloc and free produces a heap which serializes memory allocation and deallocation for concurrent CPUs, earlier code which tests the variable inmalloc fails - CPUs are reading inmalloc as 1 even after another CPU has cleared inmalloc and executed a clflush and mfence, plus executed xchg (which has an implicit LOCK prefix) to examine memory at inmalloc.

I am rather mystified by this, because I am trying the same methods to examine a memory operand as used in the mutex code, and expected all CPUs to see the same value, but this is evidently not happening. If inmalloc is examined inside the locked code section then of course inmalloc always reads as expected. There seems to be a difference, maybe in the way the CPUs prefetch data in a linear codestream compared to a loop. I could place inmalloc in uncached memory but that is not the point - my spinlock variables are in cached memory.

24.12.21 - mutex and offload Revisiting mutex and offload function modifiers which I added to the ChaOS compiler in 2011. mutex is a function modifier which allocates a 32-bit lock and inserts a call to __mutex (a simple spinlock) at the head of a function, and code to clear the lock as the function exits. mutex was still working as intended to serialize access to a function for CPU0 and one AP, but was throwing interesting and variable results when coping with 3 or 4 CPUs spinning on the lock. Tweaking and testing the assembly code of a spinlock really gets your head around the mechanics of SMP programming. The difference between cmpxchg and lock cmpxchg is the difference between success and failure. After thorough testing of 4 CPUS contending to allocate and free thousands of heap memory blocks, added a mutex wrapper to malloc and free. Now APs can safely allocate and deallocate memory on the system heap.

offload is a function modifier which inserts a call to __offload at the start of a function. If an AP is available, __offload copies stack arg count, stack args and the address of the function entry point to a work list, then returns without executing the function body. So long as an AP is watching the work list head and tail pointers, it can extract work items on to its own stack and perform the function call. Still works as intended, with the obvious limitation that stack args cannot safely refer to data in another CPU stack. Suitable for offloading simple functions such setpixel and fontchar.

23.12.21 - Fixed IPI to shut down AP Having implemented int 0xf0 some time ago as route to stopAP (return AP to halted state and destroy all AP processes), added an executive killap command to fire int 0xf0 via an interprocessor interrupt (IPI). Moved destroy AP process function to a system poll function which watches for a killCPU flag, rather than having the AP call destroyprocess directly (which is not thread-safe).

9.9.21 - NTFS long file paths After implementing chdir for NTFS filesystem,128-character limitation in ChaOS LDRIVE->cd (current directory) is quickly exposed. LDRIVE reworked to used dynamically-allocated buffer for current directory string, with initial allocation of 1024 bytes.

25.8.21 - VB.NET, C# structures, or lack thereof Discovering VB.NET's inability to access a simple #pragma pack(1) structure, translated multicast receiver project to C# using a verbose struct with containing no member arrays or references. Added a prototype UDP socket receive handler with Marshal.Copy in and out to examine incoming UDP multicast at byte level.

18.8.21 - ChaOS {cc} compiler Nailed a very very old bug in the ChaOS compiler today around parsing function prototypes which contain pointers to functions. I added pointer to function to {cc} ten or fifteen years ago but have always had trouble defining nested function argument lists. As a consequence, I generally worked around function pointer problems by casting and passing function addresses as a UL.

This bug has become a real pain since I added system timers to ChaOS a couple of months ago, with strong prototype matching. Compilation of the new code always takes two attempts, so I knew for the first time that this bug revolved around non-initialised data.

The bug was located in an innocuous test for VD arguments occuring elsewhere than the first (and only) argument in an argument list. As if to show the benefit of chasing down the most minor of bugs, the mysterious crash and stack runaway during compilation which trashed NVRAM on my DL360/DL380 servers has not reoccurred since this bug fix.

I might say it has always been a feature of ChaOS that bugs can take the system down. Generally this forces more care when programming, and produces stable code more quickly than relying on operating system protection features to catch execution faults. The key to this approach is the ChaOS 10-20 sec cold reboot time.

18.8.21 - NTFS 8.3 filenames Just amazed to see that pretty much all NTFS directory entries are duplicated as 8.3 DOS-style filenames. Maybe it's to comfort Bill Gates as he goes to sleep each night. For the plebs this means pretty much every Windows NTFS directory takes up nearly double the disk space it really needs, with quantifiable energy consequences for the planet. Yes I know this feature of NTFS can be turned off, but why is it not turned off by Windows Update??????????? I have a server with 870,000 files but 1.7 million directory entries.

17.8.21 - NTFS support Created ntfs.drv, first load-on-demand filesystem driver for ChaOS (i.e.loads when NTFS partition is detected). This is intended as a read-only browser for partitions created by Windows operating systems. Hooked into dual-port drives in D2700 SAN array, ChaOS can now browse live Windows Server NTFS partitions through the redundant i/o controller. There are cache issues which prevent file updates on Windows Server from being immediately visible through the second controller. Various flush-cache and synchronize-cache commands to the HP Smart Array controller have no effect. I have no way yet of forcing a cache flush on the Windows side (!Itanium!).. However a full adapter reset on the second controller does the trick, and file updates on the primary controller become visible.

The issue is likely to be cacheing in the RAM of the HPSA adapter, rather than the individual drive caches. So far I have found no clue as to how to instruct HPSA to invalidate the controller RAM.

1.8.21 - HP Smart Array 64-bit lba support Added READ_CAPACITY16, READ_16 and WRITE_16 functions to this driver. ChaOS now able to read/write physical or logical LUNs over 2Tb.

31.7.21 - Improved dynmatchtype Revised type matching performed during dynamic-linking to support structure/union nesting up to a depth of 30. Previous ChaOS implementation (vintage year 2000) had no support for dynamic-linking of nested aggregate types. This change allows test applications to link to the complex (and often messy) structs in Linux kernel source code, with the security of ChaOS strongly-typed dynamic linkage.

30.7.21 - HP dl380/dl360 g7 nvram vulnerable to stack runaway My dl380 g7 died on 9.7.21, (or so I thought) after a program stack runaway during development. Symptoms were fault lights for both processors and mains power, and failure to execute BIOS ROM. iLO seeing no system faults, and processors work fine in another machine so diagnosis is a mainboard failure.

These machines are very cheap, so I carried on development during July with a dl360 g7 - until hitting the exact same problem - after a stack runaway. The diagnostic lights are a bit different on the dl360, so here we have just a flashing red mains power light.

Necessity is the mother of invention, so I returned to the dead dl380 and browsed a few internet pages, as I am sure this characteristic of Proliant servers must have been seen by others before me. One such page describes low NVRAM battery voltage as a possible cause, writing garbage to NVRAM when writing server logs and leaving the machine unbootable. I was thinking that maybe my runaway program trashed the NVRAM too.

System Maintenance Switch (SMS) 6 is described as the NVRAM reset, however doesn't fix this problem when set on its own.

The REAL fix is to set System Maintenance switches (SMS) 1,5 and 6 to ON, and REMOVE THE NVRAM BATTERY. On power-up the error lights may flash quickly from their amber or red states to green (as the NVRAM is cleared?) and mainboard buzzer wails until you power the machine down again. Resetting the SMS and refitting the NVRAM battery (making sure it is producing 3Volts!) completes the job. Just have to reset the time and date, plus boot controller and boot logical volume to get these Proliants up and running again.

7.7.21 - HP SA P411 with D2700 25xSFF enclosure Exploring SAN storage array with recently-developed ChaOS driver. HPSA report LUNs inquiry produces just two physical device entries when attached to D2700 25xSFF SAS enclosure. At first I thought these two LUNs were SCSI ports, to be probed for the disk drives beyond. However the rest of the disk pack is accessible by simply probing the succeeding LUNs (advancing byte 6 of the SCSI-3 address).

24.6.21 - HP dl380 g7 superfast ChaOS reboot to {rdp} ChaOS reboot time on this server now just 19 seconds between reboot command and remote desktop connection coming back online. Hewlett Packard BIOS takes at least 5 minutes to process a reboot via iLO3 and SSL. This is a quantum leap for ChaOS driver development on enterprise hardware. The trick is to examine the netxtreme TX and RX context during driver initialization. If already running from a previous session, my driver picks up where the previous boot left off. The NetXtreme CPUs can be stopped, reloaded with firmware and restarted - they pick up the previous context and crack on. For this to work, memory allocations for network frames need to be identical to the previous boot so CPU and NIC memory structures remain in synch. Software can make this happen or issue a warning.

24.6.21 - HP Smart Array P410i driver MPT SAS (CISS) driver now up and running.

14.6.21 - {cc} preprocessor ChaOS compiler preprocessor revised to handle constant-expression or defined after #if. Also added #elif which I have never used but which occurs quite a lot in Linux header files. Logic flow adjusted to properly handle nested #if...#endif, again a construct I have found little use for. The nesting of preprocessor conditions can be elegantly handled using an integer accumulator as a binary FIFO:

For #if, #ifdef and #ifndef the accumlator is shifted left and a 0-bit added to indicate a true condition, or a 1-bit to indicate false.

For #endif the accumlator is shifted right, to recover the result of the outer nesting level (if any).

For #else the lowest bit of the accumulator is inverted.

For #elif the lowest bit of the accumulator is replaced with true (0) or false (1) bit as above.

The above flow-control directives are processed whether or not the current condtional compilation state is true.

Any time the integer accumulator is non-zero, preprocessor directives other than those above, and source lines, are skipped; also, expressions following #if and #elif are ignored. Those expressions could contain macros which have been skipped due to earlier conditional preprocessor directives; in such case evaluation of the expression is not possible. In any case the expression is irrelevant at this point, because there is at least one 1-bit somewhere else in the accumulator.

My ChaOS compiler now makes greater headway into header files in the Linux source tree before hitting errors. But a large part of the early-#include files in the Linux project contain #pragma directives and other code which is specific to the GCC compiler, i.e. irrelevant on my platforms.

2.6.21 - ChaOS BCM5709 driver on HP DL380 g7 ChaOS RDP now running flawlessly on this HP server. Time to get back on to the MPT SAS driver.

31.5.21 - ChaOS BCM5709 driver OK, So there is a second magic bullet needed to keep the receiver ring running. When the cpus are running, they require a regular update to a specific register, a "heartbeat" from the operating system. Otherwise they assume the operating system is down and go into non-OS mode. Without the "heartbeat", the receiver ring may run fine, it may not start, or it may freeze after a handful of packets.

The cpus can undergo a firmware upload and restart without corrupting the receiver context in NIC memory. They simply carry on where they left off. So the ChaOS BCM5709 driver now checks for an existing receiver context on startup. If found, the receiver ring from the previous session is resumed. This is a smoother solution than a full chip reset after a warm reboot.

29.5.21 - ChaOS BCM5709 driver A very steep learning curve, if I had known beforehand how complex these BroadCom network adapters are, I probably would not have attempted to write this driver. BroadCom BCM5709s have 5 MIPS processors and 2 RV2P processors which are totally dead when the system starts up, unless a PXE boot has been executed. So from a standing start, register twiddling counts for nothing until the cpus have been loaded with code and started.

Thanks to the Linux source code, I realised a couple of days ago that the way to go is to upload firmware from a file straight into adapter memory, (rather than trying to read it from NVRAM), and then start the cpus. Uploading the firmware is straightforward, since the ChaOS compiler handles both big-endian and little-endian integers implicitly. What took longest was to work out how to get the receiver rings working. The magic bullet is to start the cpus before the receiver ring context setups.

22.5.21 - ChaOS BCM5709 driver I have to admit, this is one of the most difficult projects I have undertaken. There is no documentation for this chip, so the only way to get a handle on how it works is to pore through thousands of lines of open source code, God Bless Linus Torvalds.

I decided to try to push the main Linux header bnxt.h for the chip (about 7000 lines of structures and macros) through the ChaOS compiler. This necessitated a rewrite of the compiler preprocessor, to handle line breaks within macros, especially within function macro arguments. This throws up issues with source line numbering and source file display when debugging, but happily that all went well. Crucially the ChaOS compiler now handles macros which expand to produce more macros, a common tool of Linux coders.

13.5.21 - ChaOS !PXE HP DL380 poses a problem for RDP because I have no driver for the BroadCom network adapters on this server. Furthermore, there is a distinct lack of documentation around the BCM5709, so writing a driver is going to be difficult.

I can see that a PXE boot on one of the DL380 g7 BroadCom BCM5709s would be a way to install a small system browser, and have a look at the internals of a working adapter.

This took very little time to implement, because all the necessary tools are already in the ChaOS bootstrap. From the outset, the ChaOS bootstrap establishes 4Gb-1 real-mode data segment limits, allowing access to the first 4Gb of memory in the machine from real-mode code. A quick hack of the ChaOS bootstrap provided a small executable to be loaded by PXE at 0x7c00. Then a quick hack to my Itanium remote boot code provided the BOOTP and DHCP packets to persuade the DL380 g7 to TFTP this little program and execute it.

Once you have a few Int 10h display functions and Int 16h getkey(), it is easy to begin browsing the BCM5709 registers, usually around mmio 0xf4000000.

12.5.21 - ChaOS RDP on SuperMicro X7DVL-E ChaOS remote desktop (RDP) is fast becoming the modus operandi for development on new platforms. To develop device drivers remotely, it is essential to have a network stack up and running early in the boot process, i.e. before the development driver starts to load.

My rough and ready solution is to identify a specific PCI device (i.e. a suitable network adapter) whilst scanning for PCI buses. Because each bus and sub-bus have to be checked for PCI bridge devices, an extra check to identify the target device allows it to be place in a pci priority list. Then call createPCIdevice() for each priority device before performing scanPCI(), (now modified to skip the priority devices, obviously).

This allows debug breakpoints to be placed in the device driver init code, and RDP will break straight into the remote debugger.

8.5.21 - ChaOS on SuperMicro X7DVL-E and HP DL380 g7 Expanded my ChaOS testbeds today with these two classic machines. The SuperMicro X7DVL-E has 2 x PCI, 2 x PCIe and 2 x PCI-X slots, so handles a wide range of adapter cards. The HP DL380 g7 has the embedded LSI SAS controller and 8-way SFF cage to further my work on SAS MPT.

4.5.21 - ChaOS on HP bl460c The current push around PCI/MSI is towards SAN storage on SAS drive arrays. Today I used a HP P400 controller and 8-way drive cage from an old HP DL380 G5 to place a ChaOS boot partition on a 2.5" SAS disk. Although I can run Ultra-320 SCSI drives using a native ChaOS driver for the LSI 1030, I am still a long way off a native driver for MPT-SAS. ChaOS however allows the creation of logical drives accessed via BIOS Intx13, and using this route, for the first time I have formatted and populated a SAS drive using x64 rather one of my IA64 platforms. Then I transferred the drive to a bl460c blade server, and used the HP Smart Array controller to configure this disk as a logical drive.

First attempt to boot resulted in a blank screen, which I quickly realised was because I had created the ChaOS boot partition with an unsupported VESA mode. Second attempt I had full control, the first chance to see a c7000 blade server from the baremetal perspective. I have to admit I was surprised by the number of PCI buses presented, 25 in total. This is an insight into the complexity of the c7000 backplane.

30.4.21 - MSI-X Moving on quickly from my MSI implementation, I have now added MSI-X to the mix, tested and working on the handful of MSI-X-capable device I have in my development platforms. I am so glad I dropped the ACPI ball many years ago, stumbling on with PCIIRQs for at least 10 years. MSI/MSI-X makes all the old hard-wired interrupt controllers redundant, a dream for a systems programmer.

28.4.21 - PCI priority Revised PCI scan to enumerate PCI buses before scanning each for devices. ChaOS remote desktop means I now have a range of development platforms moving forward on a broader front. Remote desktop rquires at least a working network device and driver. However when writing and debugging drivers for other devices on a remote ChaOS node, it is essential that at least one network adapter is initialized before the first debug breakpoint.

The solution is simply to make a note of network controllers whilst scanning for PCI bridge devices, then initialise these devices before executing the full PCI device scan.

16.4.21 - Message Signalled Interrupts(MSI) Introduced MSI as the default mechanism for any PCI device reporting MSI capability. Introduced arbitrary non-shared irq allocation for MSI-capable PCI devices in the range irq16->irq31. Expanded IDT64 to handle these interrupts when in 64-bit mode.

MSI draws the curtain on PCIIRQ, PCI IRQ swizzling and ACPI tables, all dreadful for programmers. Within a couple of hours of this step-change I have eSATA cable connect-disconnect interrupts working, a feat has eluded me whilst my SATA controller (hosting my development filesystems, compiler, linker etc) was sharing a PCI IRQ with USB and network controllers. PCI IRQ sharing does work - but coding errors during development easily disrupt filesystems when working the primary SATA controller IRQ. Therefore my current SATA controller code has evolved to use i/o polling.

PCI-E devices have no PCI IRQ lines, so either use polled i/o or Message Signalled Interrupts.

10.4.21 - GPT Disk Partitioning Added gptutils to my disk partitioning app {part}, to begin to properly exploit the potential of multiple partitions on on hard drive. As well as the obvious feature such as add GPT partition/Delete GPT partition etc, I can now perform GPT partition cloning, resizing and shifting up or down within the constraints of other GPT partitions on the drive. All data copy operations are protected by a read/write/read/verify sequence.

14.3.21 - Legacy 32-bit ChaOS and LVD SCSI Discovering that Microsoft Win 10 now prevents mapping to network drives on Win XP platforms (non-secure Samba version?), I realise that is is time to retire my trusty 32-bit Dell Precision 450 workstation, running Windows XP Pro from the year dot.

I ran ChaOS on this machine more than a decade ago and wrote a partial driver for the LSI 1030 LVD SCSI adapter within. I dabbled with Message Passing Technology (MPT) without really understanding what was happening, but managed to suck some data off a SCSI disk and left it at that.

Realising that todays SAN controllers are derived from this early MPT-Fusion adapter, I thought it would be worth reviving the old driver before dropping the Dell Precision 450 in the skip (dumpster). This proved easier than I had imagined; just a couple of tweaks to the current ChaOS source code were needed to prevent illegal opcode exceptions (accessing processor MSRs not present on the old Pentium 4/Xeon). Recompiling the old LSI driver was completely straightforward. With a little work, I now have a working driver for LVD SCSI, ChaOS cfs filesystem running etc. It is slow at the moment because I have yet to code the SCSI speed negotiaion phase.

Interestingly the Dell Precision 450 has one PCI-X slot. I already have an old LSI 1068X MPT SAS adapter, so ChaOS will shortly move on to read and write SAS drives.

23.12.20 - ChaOS RDP Spent a week gathering up threads of i86/ChaOS development from the past couple of years. HDMI external monitor autodetect, initialize and disconnect has been part of my Intel GMA driver for a couple of years, but stalled when both my Haswell i-4010u laptops died suddenly. Moving ChaOS to the HP Elitebook 8470p platform involved derating the Haswell code back to Sandy Bridge which took about a week; blank screens are a feature of GMA programming which of course makes coding and debugging more of a challenge.

This week I brought a discarded Lenovo T410 laptop into the frame, as the GMA looks very similar to the Elitebook. After a couple of days of same-old fumbling with the LVDS out of synch, I decided to overhaul the ChaOS VESA text screens, adding a shadow character/attribute buffer; this paves the way for a remote desktop function using a network link.

As it happens, the Lenovo T410 (Iron Lake) GMA is a pig when it comes to reprogramming the FDI TX and RX registers, try as I may I can only bring the LVDS screen back to synch about once in every four attempts; with no method yet to determine whether the screen is in synch or not, I have yet to find a programmatic solution.

This blank-screen environment has propelled development of a remote ChaOS desktop at a very fast pace. All up and running now with complete remote control including remote debugging, not on the host machine but by simply invoking the inbuilt ChaOS debugger on the remote machine (network link VESA shadow buffer can return either of two full VESA screens).

Currently modifying the network protocols to run this remote desktop through a VPN; remote discovery is through multicasts now rather than broadcasts.

3.7.20 - Reversing ARM network stack into i86/chaos I hardly ever post notes here, because i86/chaos is the mother platform on which all my development compilers are hosted. But development of a new network stack within the Raspberry Pi project has been so rapid and so successful, I felt bound to try to reverse the Pi code back into i86/chaos. In just one day I have the Pi stack running alongside my i86 stack, up to LNK and ARP drvs with PKB packet logging at each level. Of course there have been a few source code tweaks to make this work, but essentially this is code developed on Raspberry Pi now running on i86/chaos.

To avoid disruption to ChaOS itself, I have compiled display code from the Pi project into a new network stack driver, with a couple of simple functions to direct output to the natve i86/chaos display.

In the Raspberry Pi project, DEVS are dynamically allocated and placed in a linked list, whereas in i86/chaos DEVs are in a static table. Because the network stack DEVs are essentially protocols rather than hardware handlers, I have established a separate linked list which uses the Raspberry Pi initdev code. I call the resulting list the DEVx list.

This is possible because a couple of years ago recoded all DEVs in i86/chaos with prepended *prev and *next pointers, so the option of creating lists of DEVs (aside from the inherent parent/child/child chain pointers) has been present but so far unused.

I have confined this alternate network stack to one DRV lnk.drv which when deleted causes the existing network stack to revert to form. The ability now to generate network packets on the ARM platform and catch them on x86, using near-identical source code, is another great addition to the toolbox.

Public i86 projects:
{armc1} {vc2}

ChaOS source language is C, with inline assembler in asm {....} blocks. C types within the source code are enhanced by short-form aliases, to improve readability and speed compilation. These aliases are

:

    CH  signed char
    UC  unsigned char
    SI  signed short, 16-bit little-endian
    UI  unsigned short, 16-bit little-endian
    Sm  signed short, 16-bit big-endian
    Um  unsigned short, 16-bit big-endian
    SL  signed long, 32-bit little-endian
    UL  unsigned long, 32-bit little-endian
    SM  signed long, 32-bit big-endian
    UM  unsigned long, 32-bit big-endian
    SQ  signed long long, 64-bit little-endian
    UQ  unsigned long long, 64-bit little-endian
    SP  signed long long, 64-bit big-endian
    UP  unsigned long, 64-bit big-endian

28.12.17 - A proper DEFLATE algorithm Fixed an error in my coding of DEFLATE when scanning match chains - 258-byte matches were being dropped, instead of being stored in the match table. This fix pushes compression ratio beyond that seen in the DOCX files I am working with, 1.3% better on packages typically compressing to one sixth of original size. At these high compression ratios, the 1.4% gain results in DOCX files which are 10% smaller but which work fine when opened by Microsoft Word16.

Fixing a slight error in the code length frequency distribution squeezes files by a further 0.1%. All this without going down the lazy-match route. The verdict must be that classic DEFLATE algorithms drop potential matches from their hash tables to favour speed over maximum compression.

16.12.17 - A proper DEFLATE algorithm As mentioned earlier, public DEFLATE source code is opaque to me especially when looking at hash table generation. It is largely 16-bit code, written to run in a series of 64k segments. Since memory is no issue in a 32-bit flat linear system, I decided to try a linear table to map occurences of the byte-triplets which serve as the baseline string-match in DEFLATE. Using 16-bit values to point back into the 32k input buffer, this table is 256*256*256*2 bytes in length, i.e. 32Mb, initialized to all zeros. At each step through the input buffer this table can be checked for a previous occurrence of the triplet (i.e. a non-zero value), otherwise the current offset is saved to the table in 1-based format (i.e. plus one). This produces some fodder to output DEFLATE length and distance codes, with a simple loop to check succeeding input bytes to match longer strings.

A two-pass approach is needed, through each input block, first to identify the string matches (each of which is saved to a table for the second pass), and to increment the frequency distributions for the output codes. The Huffman trees can then be generated for those output codes. The second pass is very simple

:
    MATCH* m=matchtable;
    for(n=0;n< matches;n++)
        {
         while(offset< match->offset)
            {
             //output literal symbol, 0 through 255
            }
         //output match->length code
         //output match->distance code
         match++;
         offset+=match->len;
        }
    while(offset< inputblocklen)
        {
         //output literal symbol, 0 through 255
        }

This simplistic approach resulted in a bit more compression on the DOCX XML files, and served to debug the full range of DEFLATE length and distance codes.

With the addition of another table, this time 32k*16-bit offsets (I call this the chain table), when an entry in the main 32Mb table is found to be occupied, offsets can be moved to the chain table and the later match patched in. This results in chains of triplet-matches. For each triplet match encountered in the first pass, these chains can be checked for longer string matches, naturally favouring the nearer matches (shorter distances) which are first in the chains. This is the essence of DEFLATE. This algorithm produces the 75-90% compression ratio we expect. Direct comparison with Microsoft Word16 DOCX compression reveals this algorithm to be about 0.3% worse on the compression ratio. The difference is down to lack of the "lazy-matching" tweak. I prefer speed and simplicity of code over absolute compression.

21.11.17 - ZIP/UNZIP/DOCX Added code to {unzip} to extract a full archive to current directory and subdirectory tree, creating also a ziplist file to record the contents of the archive. Developed {zip} further to process ziplist into a multi-file .zip package, all fairly straightforward. Running UNZIP/ZIP sequence now produces a file which can be loaded by Windows10, but which produces an error when attempting to extract (inflate) parts from the archive (whilst Debian Linux works fine)..

Internet Wisdom suggests the way around this problem (and similar problems with older .zip files and newer Windows versions) is to download a program called 7-Zip, decompress then re-compress the archive, which does work. However I correctly guessed that the Windows issue is with the way in which a zero-length Huffman distance tree is stored in the Deflate block head. Deflate block encoding stores the length of the distance tree, less 1, in 5 bits. Therefore it is not possible to store an empty distance tree, it must have at least one leaf node. Linux allows the symbol for this single leaf node to evaluate to zero; Windows however requires this single leaf node to be non-zero.

With this tweak in place I have a .zip format which is acceptable to Windows10. Therefore I can now disassemble a DOCX into its component parts, then reassemble them for the Windows10/Word2016 environment.

One extra tweak relates to DOCX image components which initially were missing from my reassembled packages. Parts such as JPEG may be stored uncompressed into the ZIP package, because they would not compress down much further, if at all. An extra parameter in ziplist records the compression method (now 8 or 0), so {zip} can reassemble the package with the same compression mix.

17.11.17 - Simple .ZIP file Initial Deflate algorithm complete, using just literal symbols (no distance codes), wrapped into {zip} project to create a simple .zip file with one Local File Header, one Central Directory File Header and one End of Central Directory record. Compression ratio is awful, just 33% but the object at the moment is to get a working DOCX format. Debian Linux accepts the .zip image and displays the single-file contents, and will decompress the output provided the crc32 is correct. Just for the record the crc is performed on the Inflated file image.

13.11.17 - Huffman Encoding Source code in ZLIB ZIP archive is mature and complex, probably easier to write a DEFLATE algorithm for ChaOS from scratch, rather than hacking the Linux source code. Created a debug version of my INFLATE code, to produce a symbol frequency distribution for the first Huffman tree encoded in a sample DEFLATE block (this being the code_length array).

I already know how to build a Huffman tree from the simple array of code lengths in a DEFLATE block - the question is can I create a Huffman tree from a symbol frequency array?

After four days the answer is YES. It is relatively easy to sort and coallesce leaf nodes to produce a Huffman tree, but near impossible to rearrange nodes at each level into ascending symbol order. However the code length array as encoded at the start of a DEFLATE block can more or less be read straight from the tree heirarchy - code length is simply the number of parent links up to the tree root node. A pass though the tree to find code lengths for each symbol also reveals the minimum and maximum code lengths. With this information it is dead simple to generate the huffman bit codes corresponding to each code length, remembering we need the length AND the bitcode to generate a token for the DEFLATE output stream:

    UL  lvl,m,code=0;
    for(lvl=mincode;lvl<=maxcode;lvl++)
        {
         for(m=0;m< syms;m++)
            {
             if(codelen[m]==lvl)
                {                        //ascending syms at each level
                 bitcode[m]=code;code++; //produce adjacent ascending codes
                }
            }
         code<<=1;
        }

9.11.17 - ED/UNZIP/RED Quickly hacked together a selectlist program {unzip} to display contents of ZIP file and decompress content on a keypress a into a rough read-only hack {red} of my source editor {ed}. Added //z: hyperlink format to {ed} similar to //p: hyperlink which invokes my PDF reader in a separate screen area for browsing references while editing source files.

Added global clipboard to ChaOS , copy function to {red} and paste function to {ed}. Now I can transfer ZIP archive content straight to {ed} without cluttering my disks with extracted files. Added parameters to //z: hyperlink to go straight to a nominated file + source line - really useful. Stopped short of a grep search of the ZIP for a text match, but not difficult to do should I ever need to find a needle in a ZIP haystack.

8.11.17 - XML/DOCX/PKZIP Investigating DOCX file format and nibbling at the associated ECMA Specifications to discover that DOCX is in fact a PKZIP archive containing a group of XML parts. Also discovered that compression method within these packages is type 8 DEFLATE, and identical to PDF format except that ZLIB header and trailing checksum are omitted. I already have code running in ChaOS to INFLATE these blocks, so could see the uncompressed DOCX parts within short order.

Further serendipity in downloading a GitHub archive, to have a look at some source code for the DEFLATE algorithm - exactly the same compression is used here too.

6.11.17 - {xhc}/kbd Crafted my first interrupt TRB ring for USB keyboard on XHC. Easiest is to clear the link->toggle flag, so that TRBs can be re-used unchanged next time around. Keyboard Transfer Complete interrupts disrupt the MSD driver, (more work to do here), but code worked straight away on an Asus BeeBox - a quantum step towards running ChaOS on machines with no PS/2 port.

2.11.17 - {xhc}/msd Playing around with SCSI READ16/WRITE16 READCAP/READCAP16 to discover USB->SATA docking stations can actually access sectors above the 2Tb ceiling. Modified {xhc}/msd to invoke READCAP16 when READCAP returns 0xffffffff sectors, and switch to READ16/WRITE16 CBW protocol for drives >2Tb.

1.11.17 - {xhc}/msd Added Mass Storage Device class driver to XHC, now able to back up development partitions to SS USB stick five times faster than USB2.

30.10.17 - {xhc} USB configuration now advanced through getUDD, getproductstring, getconfigdescriptors, enough then to construct an Input Context for a USB Mass Storage Device, including Endpoint Contexts for the standard Bulkin/Bulkout pipes. Thus Slot State is advanced to Configured state, ready to try talking to the device proper.

Command Completion code 17s are returned until the Input Context is filled in properly for this - key points are the Input Context size is encoded in the Slot Context (keep count of max endpoint index when processing Endpoint Descriptors into EndPoint Context blocks) and Input Context->add&1 needs to be set, besides the relevant endpoint bit fields, to flag a Slot Context update.

Cache synchronization is needed before gathering data from the XHC dma buffers - I have tried many ways to do this but only asm{wbinvd} does the trick.

27.10.17 - {xhc} Initial framework now in place, device connect and disconnect events now creating/destroying container xUSB DEVs, with XHC slot allocation, endpoint 0 configuration and transition to XHC Addressed state. Simple Endpoint 0 TRB ring with Link TRB and PCS toggle all working, using a GETDESCRIPTOR request as test fodder.

What caught me out for a while is the aggressive power-saving of USB3, which requires a check of USB2/USB1 link state before attempting any USB traffic. The symptom is Command Completion code 4 (USB transaction error), and the cure is a setlinkstate(15),msdelay(20,setlinkstate(0) sequence which should bring the link state to 3 (running) with a PORTSC Event. I do not fully understand the power saving regime yet, that will come later.

23.10.17 - {xhc} Event Ring and Command Ring now running. xHC hardware reset stop spurious startup interrupts, of course all registers need to be then programmed from scratch. Using NoOp (TRB type 23) to test the command ring, and get the RCS flag working correctly. xHC reports 13 available ports, 1 to 9 are USB2, 10 to 13 are USB3, some are connected internally on my laptop, some are unconnected. xHC PCI[0xd0] is a port routing register to switch ports between the PCH EHC controller and the xHC. Flicking these switches to 1 causes a Port Status change event for each port with a device attached. Copying the contents of PCI[0xdc] (which equals 3) to PCI[0xd8] enables the two marked SuperSpeed external USB sockets for USB3 device connect on port 10 or 11. If a USB2 device is plugged into these sockets it appears on port 1 or 2.

20.10.17 - {xhc} Beginning work on USB3 via {xhc}, a driver for the XHCI controller on my laptop.

First job is to understand and construct an Event Ring, to provide a stream of interrupts and Event TRBs. Although PCI:0x3c register as filled in by BIOS on this device indicates a connection to pic vector 7, this vector number is absent from the LPC Bridge redirection registers, and no PCI pin interrupts come through to my handler. This is fixed by using the Message Signalled Interrupt mechanism, setting MSI Address Register to 0xfee00000 and MSI Data Register to 0x0030/0x0130/0x4030/0x4130/0xc030/0xc130 for a handler on IVT vector 0x30.

The Event Ring is easy enough to set up, remembering that the XHC is the TRB producer, and software is the consmer of events, opposite to the other TRB rings in the device driver scenario. One fact either missing from the documentation, or which I have overlooked in the documentation is that the EHB flag in the Interrupter Dequeue Pointer latches on if this register is written with an invalid address - e.g. for an empty ring, just after setting ERSTBA - anything other than the first ring segment base address will latch EHB. Of course this then generates an interrupt, but with no event - the XHC is protesting about an attempt to advance the dequeue pointer before an event has been produced.

To discover this barely-useful fact, I had made an algorithmic error inside my interrupt handler, advancing the Dequeue pointer too far. The flood of interrupts produced confused me for a while - expecting just one interrupt per second from MFINDEX rollover events but getting 1000 interrupts per second - until one realizes this rate is the default setting for IMOD - the XHC Interrupt Moderation Register.

Also of interest is the programming of 64-bit MMIO registers on the XHC