Wednesday 25 May 2011

And here come the elves!

Elves, what do you mean by that?

Well, there is this file format called ELF. It is an acronym for Executable Linkable Format.

Now what does this mean, or what the heck, why not what is a file?

Good question. What is a file. It seems so obvious, a file is a set of collected data, but how does the computer know that this piece of data is part of this file and what you've placed in another file is indeed content of this other file.

This has to do with the way the hard disk is made up but also how the main memory or RAM works.

The system implements a file-system. Now what is a file-system? It basically is an index of which regions of the disk are part of one file and which regions are part of another.

Okay, this is on disk, now what about in main memory?

The main memory contains tables which point to places on the disk. These tables are read of fixed places from the disk, or rather partition. For simplicities sake we'll assume the disk to hold only one partition meaning that those terms can nearly be used interchangeably.

So we now have a table of pointers to what? Files? Directories? Other tables?

Well, I'm still investigating this part, but for what I know the table points to files. There can be special files, which are called directories, and other types of files called symbolic links.

A special case is the hard-link which makes 2 pointers point to the same file.

That's fine and all, but how do we get this all from disk to memory? Is it just a matter of reading from a pointer and getting the answer or is there something more complicated going on.

Well, I know it's a little more complicated than that, but I still need to investigate this bit. All I know grub has done some things for me and I can just use a region in memory Grub has given me in which the core image is loaded. I need to parse it my self into the regions where it needs to be.

So the core image needs to be put in some other place than it is in right now. Why is that and how do you do such thing?

The reason why the image needs to be replaced is that it is an ELF image which is more or less compressed. That's fine and all, but this makes that it misses some very important things. One of which is the fact that there is no space reserved for variables, only indicators of where they need to be. Another issue is that I have written my code to go to a fixed place, somewhere very high (or very low, depends on how you look at it) in memory. Grub can't load to this place since it doesn't support virtual memory (at least, it doesn't initialise it for the client, according to the multiboot specifications). This means I need to initialise it my self and put the image there.

I've chosen ELF because it's flexible, but I also could have chosen the plain binary format. That's nice and all, but that also means linking is a bit tougher. Now from a security point of view this isn't necessarily a bad thing, but from a development perspective it could mean that the image is harder to inspect with an object dump.

As to how we do this, it turns out to be quite well documented in the ELF specifications. The ELF header holds pointers to the inside of the file with notes on where the section should go, and all I need to do is put the sections into place. Once that's done, I can (in case of the core image) transfer control to the core image. In the case of an user space application I should probably fork first.

Now to get the image as it is on disk it first needs to be transferred to main memory. Luckily Grub has done all the hard work for me, meaning I can transfer control without the need for a messy hard disk driver.

Monday 16 May 2011

Hello paging world!

Yes, that's right. We've just entered paging mode.

I just made the commit available which solved the issue on the double fault caused by the page fault. The issue came from the halt&catch fire instruction, because it releases after an interrupt.

I had some issues with paging because I didn't map the VGA memory and the Stack.
These two are now solved.

So what is paging precisely?

Paging is using tables in the CPU to translate virtual addresses into actual physically accessible memory.

How do we do that?
Well to explain that we're going to split the memory up in parts. We're also going to split your memory addresses up in parts. Further more we'll expect the Intel CPU to be in 32-bits protected mode, so our linear addresses reach up to 4 GiB.

The memory is divided into pages, each being 4096 bytes (or 4 KiB) in size. So that means we have about 1 million pages. To reference these pages, we're going to need page tables.

Each page table is exactly 1 page in size, with each entry being 4 bytes. That means only 1024 pages can be accessed in a single page table. If you've been paying attention you can see that this only references 0.1% of all the memory space.

That's somewhat of an issue, but because the designers at Intel aren't stupid, they've created what they call a page directory. Now this page directory also holds 1024 entries of a 4 bytes each. These entries point to the page tables. That means we've now got 1024*1024 pages. That's more like it.

Now, if you don't understand the way this works, you should probably read the Intel manual Volume 3A, chapter 6. It also covers PAE and Long mode paging. Further more it holds the flags required for the implementation.

All the cache bits are set to 0, so the default mode gets used.

Tuesday 10 May 2011

All on a big heap

Now what have I been working on?

Well, I've been working on dynamic heap allocation. It took me quite a while as it required me to use the memory map which I had to set up to figure out which pages I am able to use and which I should leave as they are.

What are pages?

Pages are basically chunks of memory. They are usually the same size and can be used for many different tasks. One of the most common tasks is to let every process think they own the entire address space while in reality there is only 10MiB of physical memory in place. Also,h this must be shared between processes which makes it even more complex.

And how did you get this memory map?

The memory map was provided to me by grub. Grub uses the BIOS to figure out what can be used and what can't be used. It provides areas, which might overlap. Because this map is so unreliable in some regions I decided to set up my own memory map, in a much more flexible style.

What kind of style is that?

I chose to give each physical page an owner. This can be used in the future to decide whether or not a process can actually access it. This is all dumped into an array, which is currently, 2MiB in size.

2MiB? That's huge!
I know, especially if you have only 10 MiB of memory, however those PC's are rarely ever found these days, and the smallest I have found is 64 MiB, so the table is only about 3-4% of the actual amount of physical memory.

And why was this so hard?

Well, I kind of made a small error in the memory map, and because of that I started writing data into my code. And if there's anything that's difficult to detect, it's writing data into code.
I solved it now, and since the code has passed every test suite I think it's time for a new tag, called Indev-0.0.2. It marks the current commit, as being the latest "stable" indev release.

So what can we see coming in the future?

Well, I think in the near future paging will start happening, and once that's done, I think I'll start worrying about elf loading the core image, and of course jump into that image.
When that's done the planning isn't fixed and but I think I'll (or hope it's we'll by that time) be working on getting into usermode. Once that's done I will probably start worrying about drivers and such but I don't know for sure.

Wednesday 4 May 2011

Have you been gone?

Ok, there were little code updates lately. But that has a reason.
I was working on getting the decompressed image working, and at this moment, that's more a matter of thinking than writing code.

So what have you done?

Well, I've split up the linker script into one for the decompressed image and the compressed image. I removed the hardware interrupt code from the compressed image, and I've written a new entry point, but this time for the decompressed image (because the linker started complaining).
I also had to do some work on the makefiles, which fortunately for me, is done now.

Where are you going next?

Well, there are still some issues related to expanding the code. For example I don't want to write an #ifndef around every source file I don't want in the compressed image. Unfortunately I still don't quite know how to do this. It will probably boil down to rewriting the Makefiles though.
I also want to get some space from grub to put the heap for the compressed image. I don't know how I'm going to do this either, but I think it will be a case of declaring a humongous1 variable which will then be my heap, but if there is a possibility to do this from the linker script, I'll do that, since I think that's a more elegant2 way of coding.

1Something in the region of 32 MiB.
2I consider elegant code to be:

  • expandable
  • readable
  • understandable
  • reusable
  • stable