In the previous article I’ve given you an in-depth introduction into the Cray X-MP system. With that the stage is set to discuss the simulator I’ve built. This is what this story is about.
For the impatient…
Once you downloaded the simulator and uncompressed it, you can start it (from a command prompt in the ‘bin’ folder) with the following command:
This should fire up the simulation of an X-MP model 48, and start booting into the recovered COS 1.17 image.
You should see several console windows (these are the serial terminals attached to the IOPs) pop up, with something similar:
What you see here is the IOS (the IOP runtime) boot screen, showing the configuration dump of the particular IOP. There is one window for each IOP.
Select the one for IOP-0 (seen at the top of the window). Type in:
START @DK0:COS_117 @DK0:INSTALL
Make sure you use all upper-case letters, and hit enter. Soon after, you’ll see the following:
What you did now is loaded the COS image into the mainframe and provided it with a ‘parameter’ file, called INSTALL. Since this is the first time you run the system, the disks are empty and they need to be initialized. This is the main goal of the install procedure. After install, you normally start the system using the DEADSTART parameter file.
The messages inform you about a successful boot. Now, type in:
A new window appears with the following content:
This is the main window for the ‘Cray Station’, the equivalent of a console on COS. Now, type
You should get the following response:
You see on the top, the status line now shows an ‘L’, meaning you’ve logged in. The response line informs you of the OS version and build-date.
HELP to see the available commands:
This is just the first help screen. Some commands, like help have multiple screens (frames) of output. You can switch between them using the ’
+’ and ’
If you want detailed usage information on a command, type
HELP <command name>, for example
Let’s try one of these! Type
MONITOR,CPU. You get this:
As expected: the CPU idles since we haven’t started any jobs yet.
And we won’t… I don’t know how to submit jobs, and my attempts to do so have failed so far. At this point I can poke around the system and see various interesting-looking statistics, but there’s not much else I know how to do. Please, please, please contact me if you have more information!
If you recall the block-diagram of the X-MP system, you’ll see something like this:
The mainframe itself has the following internal setup:
In order to simulate the system, I had to simulate all of that (and then some).
When – several years ago – I first started working on the simulator, I thought that I will spend most of my time simulating the CPUs. Than I contacted the Cray-Cyber guys about using their YMP-EL to do some early testing. They replied that their machine is off-line unfortunately – so much for early testing – but they also had the following warning:
99% of the task of recreating a machine is the I/O system, the CPU is
the trivial part.
In retrospect, I have to absolutely agree. I’ve spent an enormous amount of time re-creating the I/O subsystem. The CPUs (even the IOP CPUs included) were indeed the easy part. On top of the CPUs there are a lot of peripherals attached to the system and simulating them is crucial in successfully booting the OS. After all one of the OS-es main responsibility is to provide interfaces to the peripherals.
The state of the simulation of the various pieces is the following:
The simulation of these is quite complete with the exception of the performance counters and maintenance mode instructions. These will throw an exception if ever attempted to be executed.
Some parts of the CPUs are better tested than others, the floating-point and vector operations being the weakest parts.
Clusters and shared registers are also simulated.
You can set how quickly the real-time clock expires making simulated time go faster than what the simulated CPU-clock would suggest.
The IOPs are fully simulated with their associated internal peripherals. They’ve churned through a lot of code over the course of this project so I’m fairly confident in the implementation. The weakest part is probably the simulation of the carry flag and which instructions actually change it.
Just as with the CPUs, you can set how quickly the real-time clock ticks making simulated time go faster than what the simulated CPU-clock would suggest.
Both the CPUs and the IOPs support a so-called ‘burst’ mode where they can execute several instructions in one ‘clock’ cycle. In this mode the CPU or the IOP will continue executing instructions until it sees an I/O or jump instruction, where it will break the burst. Of curse the burst also breaks if the predefined number of instructions are executed. This option speeds up simulation considerably and since the simulator is not cycle-accurate anyway, it won’t reduce simulation accuracy.
The IOP simulation supports the definition of various ‘break-points’. If the IOP executes the instruction at the break-point address, the break-point triggers and its actions take place. I’ve found these most useful in controlling the log-level so that interesting pieces of code get logged in great details while others are not logged at all. At some point a similar facility existed for the main CPUs as well, but during the many re-writes that feature broke and I never got around fixing it.
Memories for CPUs and IOPs are simulated as well as the shared Buffer memory of the IOPs. The access latencies are of course not, since the simulation is not cycle-accurate. I also decided to not simulate errors, so parity and ECC checks are not performed and no memory errors are ever reported.
Some memories support ‘pokes’. These are set up in the configuration file and allows for pre-reset modification of the content of the memories. This is useful to be able to patch over lengthy (and in a simulation environment completely useless) memory checks for example.
These are the low-speed channels between the IOPs and the CPUs. They are fully simulated with the exception of channel (parity) errors.
High-speed memory channels
The high-speed DMA channels between the IOPs and the mainframe memory are fully simulated, again with the exception of parity errors and diagnostic modes.
The SSD is not simulated at all. The configuration of the machine that I have the boot image for didn’t have an SSD so simulating it was not necessary so far.
The Peripheral Expander was an interface device that attached to one of the IO channels of the MIOP and itself had several IO channels connecting to actual peripherals. The documentation I had contained detailed description of the peripheral expander itself, but it had zero information on the peripherals attached to it. With a lot of reverse-engineering I managed to create somewhat functional simulation for most of the peripherals that can attach to (an X-MP) peripheral expander. These peripherals act as the ‘local’ devices for the IOS, the kernel running on the IOPs. (The rest of the peripherals are available for COS, running on the mainframe.)
As it turns out – and I figured this out just recently – the Peripheral Expander interfaces to Data General (Eclipse or Nova) peripherals. These popular computers were first used with original Cray-1-s as the maintenance computer, so the use of the same peripheral set makes some sense.
There are four kinds of peripherals that can attach to the Peripheral Expander: a tape drive, a hard-drive, a line printer and a real-time clock. There were a couple of peripheral options for each of these. I decided to simulate the ones that the COS image I had was configured with, or – in case of the RTC – the one that was easier to reverse-engineer.
This is a simulation of the ‘Data General Tape Drive’. At the beginning of the project I had no documentation so I worked on the simulation model based on reverse-engineering the IOS modules talking to the tape drive. From this work (and reading up on tape drivers in general) I gathered the following:
A tape contains a set of files, separated by EOF markers. The tape itself is terminated on one end by a BOT (beginning of tape) marker – a shiny piece of metal glued to the tape – and an EOT (end of tape) marker on the other. Usually a tape drive can seek forward and backward on the tape, searching for an EOF or EOR (see later) mark. It can also read forward or backward, and of course write or erase.
I decided to simulate a tape, using a directory. The files in that directory would be the files on the tape. The files would have a strict, numerical name: 0.dat for the first file, 1.dat for the second, etc. This structure makes simulation of BOT, EOT and EOF markers easy. Seeking and reading in either direction is also simple. Writing and erasing however is a bit problematic: On tape, at least in theory you can erase one big file for example and write three small ones in its place. This can be simulated by dynamically renaming the files after the write-point, but you have to keep track of the original (now partially overwritten) file size. For now, I’ve opted for a simpler method: first, I only allow writes to start at the beginning of a file, and second, I delete all further content from the tape once a write operation is started. It can of course be improved upon, but it’ll suffice for now.
Because tape writes are so destructive, you can mark the tape as ‘read only’ in which case all write operations fail.
Later during the project, I’ve found some documentation on these tape drives. Reading this I realized that tape files had a record structure to them: reads and writes were performed one record at a time. The record size was arbitrary and determined during the write operation. Seeks counted their progress in the number of records skipped, not in bytes (or words). This is very hard to simulate in the current framework. Luckily, the IOS code only ever uses 2048 word (4k) records, except of course at the end of a file where a smaller record is used to store the remainder of the data. This meant, that I could hard-code the record size into the simulator and get it working without an expensive re-write. For now, at least.
The expander disk seems to be a very simple device. It supported reads and writes to any sector and that’s about it. It might have had more complex functions, but there’s no hint to them in the IOS code, in other words, if there was any, it’s not used. It had four heads, 823 tracks and 5 sectors in each track. A sector contains 2048 words (4kBytes).
The simulator uses a raw disk image file to store the content of the disk.
IOS uses a special – rather simple – file-system on the expander disk. I’ve reverse-engineered most of it, and created a program that can pre-populate a disk image with files from your hard-drive. This program is called ‘exp_disk_create’.
Expander line printer
I simulate the ‘Gould printer’ variant, which had a rather simple protocol: you set it up with a text buffer and it got printed. There are some control codes that are sent occasionally. I’ve assigned functions, like line-feed or page-feed to them, but I’m not absolutely sure if I did that correctly. The simulator dumps the printers output into a (configurable) file, not a physical printer.
The machine that I have the OS image for didn’t have a real-time clock. However it still had all the drivers for it and it was fairly easy to enable them. This allowed me to reverse-engineer the communication protocol and simulate the clock.
There are a few interesting things about this device. One, it uses a variant of the AT protocol – yes, the one used for modems. It also happened to be manufactured by Hayes, so I guess the protocol choice makes some sense. What doesn’t is that the RTC occupies 3 expander channels. One for control, one for outgoing AT commands and one for incoming responses. This doesn’t jive well with any of the other expander peripherals and it doesn’t fit nicely into the expander peripheral driver framework of IOS either. It honestly feels like a big hack. But at least it works, and who knows, maybe the was a reason for this.
Since the IOS and COS software has serious Y2K issues, I decided to cheat in the simulation of the clock. While it in fact reads the date from the host machine, it subtracts twenty from the year so it reports it as if we’re still in 1993.
DD-29 disks were used as the main storage device for the mainframe. While the IOS image has drivers for several (older and newer) drives, the configuration that the I have the image for had nine of these drives connected to it.
Which was lucky as this was the only drive that had decent documentation in the Cray-1S Hardware Reference Manual. So it was mostly a smooth sail implementing these. They are roughly 600MB drives each, though I’ve made the number of sectors, tracks and heads configurable though the configuration file. The simulated drivers use a flat image file for storage, that is created on first execution if it doesn’t exist. You can also make the disks read-only using the ‘WriteEnable’ option in the configuration file.
The original consoles of the Cray X-MPs were what seems to be modified Wyse-50 serial terminals. The console simulation – instead of serial interfaces – uses TCP sockets to communicate. Through the configuration file, you can specify the port to listen on for each console as well as an optional program to launch when data wants to flow to a disconnected console. This allows the simulator to automatically spawn new consoles when communication is attempted with it.
Since I haven’t found a free Wise-50 terminal emulator out there, I’ve written a very simple one, called wy50_con. It only implements a small subset of the escape sequences of the original Wise-50 terminals, the ones I’ve actually seen on the output. I’ve also seen some – either undocumented or none Wise-50 compatible – escape sequences coming from the Cray Station built into the IOS for line-drawing. These are also implemented in the terminal emulator.
Front-end interfaces (sometimes called concentrators) are as far as I could understand them early versions of network interfaces. They provided access to the mainframe from remote machines, using the Network Systems Corporation HYPERchannel network. These remote machines in turn would run their version of the Station software where users could create and submit jobs, monitor their progress and download the results. The station software was available for a number of machines, including DEC VAX (running VMS), CDC Cyber (running NOS), IBM 370 and 390 (running MVS) and Apollo computers.
The simulator contains at this point only a very rudimentary front-end interface, enough to not crash the OS, but not providing any useful functionality. Luckily we don’t need one, as the IOS running on the IOPs have a full-featured station implementation in it.
These interfaces were used to connect the Cray to IBM-compatible peripherals, probably mostly tape drives and maybe punch-card readers. The simulation as of now is next to non-existent and probably very buggy. It lets the OS to continue, determine that there’s something terribly wrong and never touch the interfaces again. A lot more work needs to be done to get the simulation to a functional stage.
IBM tape drives
While there is a stub for tape-drive emulation it’s in very early stages at this point. A lot of work is needed to get this functional. At least there seems to be some documentation on the protocol available.
State of the union simulator / Download
So where does it leave us? I have a simulator that can boot quite a ways into the COS image that I’ve recovered. It gets stuck at some point and I would love to have more information on what it supposed to do to determine where to go next. I’m fairly certain that the system is not fully operational yet and there are still bugs to work out, unfortunately as the project progresses the bugs become more elusive and harder to track down.
You can download the sources and executables from the Download page.
If you whish to compile it, you’ll also need a recent Boost library. I’ve worked with 1.53.0. You might have to patch up include and library paths to match your installation of boost before things start to work.
The solutions should work under Visual Studio 2012. The makefiles are for Mingw. I’ve used GCC 3.8.0 and GNU make 3.81 for my builds. I’ve attempted a Linux port as well. I’ve gotten it to compile, but many tools, including the simulator seem to crash immediately even before entering main. Even on Linux (due to using some C++11 features) you’ll need a fairly recent GCC, I had to compile mine from sources (for 4.8.0).
Finally a list of things left to do:
- Most importantly, the system doesn’t yet fully boot. There could be all sort of bugs lurking in the shadows that cause this, not the least of it me not knowing what should happen. If you have seen a Cray booting and have an idea of the expected behavior of the system, please contact me.
- The simulation of several peripherals is incomplete (see list above). This can result in system misbehavior
- I don’t have any toolchain or source code for the system. Even if I could get the OS to completely boot, there’s not much I can do with it.
- The simulation is not cycle-accurate. This can cause all sorts of issues, though so far I haven’t seen any code that would rely on timing.
- Break-points are not implemented for the main CPUs at the moment, only for the IOPs.
- Even for the IOPs the could be more useful: right now they only trigger on addresses, and due to the overlay mechanism a particular address can belong to several pieces of code over the course of the simulation.
- The simulator is slow. It’s probably not even within 1% of the original speed.
And with that, we’ve reached current times. This is where the project stands at the moment. I’ll add new content as I have new developments, but…
I Need Your Help!
If you know something useful, have used Crays in your previous life, have old SW, tools, source-code, data, pretty much anything for these machines, or just interested in helping making this simulator more functional, please contact me!