Permalink
Browse files

Initial Commit

  • Loading branch information...
1 parent 25f83bb commit 7521a76c4db50d5fcc2e9223f037f3409dcc207c @taviso committed Sep 28, 2012
Showing with 2,250 additions and 2 deletions.
  1. +3 −0 ChangeLog
  2. +339 −0 LICENSE
  3. +26 −0 Makefile
  4. +3 −0 NEWS
  5. +63 −0 RARTemplate.bt
  6. +234 −2 README.md
  7. +113 −0 bitbuffer.c
  8. +15 −0 bitbuffer.h
  9. +83 −0 bitbuffer_test.c
  10. +11 −0 gdbinit
  11. +347 −0 parser.c
  12. +15 −0 rar.h
  13. +156 −0 raras.c
  14. +222 −0 rarld.c
  15. +31 −0 sample.rs
  16. +25 −0 stdlib/constants.rh
  17. +188 −0 stdlib/crctools.rh
  18. +43 −0 stdlib/math.rh
  19. +44 −0 stdlib/util.rh
  20. +27 −0 test/Makefile
  21. +23 −0 test/bitorder.rs
  22. +23 −0 test/bswap.rs
  23. +39 −0 test/compensate.rs
  24. +24 −0 test/crc32.rs
  25. +18 −0 test/fib.rs
  26. +25 −0 test/helloworld.rs
  27. +25 −0 test/mod.rs
  28. +37 −0 test/operands.rs
  29. +23 −0 test/vectormatch.rs
  30. +25 −0 test/vectorrow.rs
View
@@ -0,0 +1,3 @@
+2012-09-27 Tavis Ormandy <taviso@cmpxchg8b.com>
+ * First commit with working crctools package.
+ * Added some testfiles.
View
339 LICENSE

Large diffs are not rendered by default.

Oops, something went wrong.
View
@@ -0,0 +1,26 @@
+CFLAGS = -ggdb3 -O0 -march=native -std=gnu99 -Wall
+LDFLAGS = $(CFLAGS) -lz
+RARAS = ./raras
+RARLD = ./rarld
+
+%.ri: %.rs
+ cpp -Istdlib < $< > $@
+
+%.ro: %.ri
+ $(RARAS) -o $@ $<
+
+%.rar: %.ro
+ $(RARLD) $< > $@
+
+all: raras rarld sample.rar test
+rarld: rarld.o bitbuffer.o
+raras: parser.o raras.o bitbuffer.o
+bitbuffer_test: bitbuffer_test.o bitbuffer.o
+
+test: bitbuffer_test
+ ./bitbuffer_test
+ make -C test all
+
+clean:
+ rm -f *.o raras rarld *_test *.ri *.ro *.rar
+ make -C test clean
View
3 NEWS
@@ -0,0 +1,3 @@
+
+Version 0.1-alpha, 2012-09-27
+ * Initial alpha release.
View
@@ -0,0 +1,63 @@
+
+typedef enum<ushort> {
+ MHD_VOLUME = 0x0001,
+ MHD_COMMENT = 0x0002,
+ MHD_LOCK = 0x0004,
+ MHD_SOLID = 0x0008,
+ MHD_PACK_COMMENT = 0x0010,
+ MHD_AV = 0x0020,
+ MHD_PROTECT = 0x0040,
+ MHD_PASSWORD = 0x0080,
+ MHD_FIRSTVOLUME = 0x0100,
+ MHD_ENCRYPTVER = 0x0200,
+} MHDFLAGS;
+
+typedef enum<ushort> {
+ LHD_SPLIT_BEFORE = 0x0001,
+ LHD_SPLIT_AFTER = 0x0002,
+ LHD_PASSWORD = 0x0004,
+ LHD_COMMENT = 0x0008,
+ LHD_SOLID = 0x0010,
+ LHD_LARGE = 0x0100,
+ LHD_UNICODE = 0x0200,
+ LHD_SALT = 0x0400,
+ LHD_VERSION = 0x0800,
+ LHD_EXTTIME = 0x1000,
+ LHD_EXTFLAGS = 0x2000,
+} LHDFLAGS;
+
+typedef struct _VOLUMEHDR {
+ ushort header_crc;
+ uchar header_type;
+ ushort header_flags;
+ ushort header_size;
+} VOLUMEHDR;
+
+typedef struct _MAINHDR {
+ ushort HighPosAv;
+ uint PosAv;
+} MAINHDR;
+
+typedef struct _FILEHDR {
+ uint PackSize;
+ uint UnpSize;
+ uchar HostOS;
+ uint FileCRC;
+ uint FileTime;
+ uchar UnpVer;
+ uchar Method;
+ ushort NameSize;
+ uint FileAttr;
+} FILEHDR;
+
+typedef struct _PACKED {
+ uint PpmBlock : 1;
+ uint KeepTables : 1;
+ // A canonical huffman tree
+ // ...
+} PACKED;
+
+uchar signature[7];
+VOLUMEHDR VolMainHeader;
+VOLUMEHDR VolFileHeader;
+FILEHDR FileHeader;
View
236 README.md
@@ -1,2 +1,234 @@
-rarvmtools
-==========
+RarVM Toolchain
+===============================================================================
+
+This is a basic toolchain for the RarVM, a virtual machine included with the
+popular WinRAR compression suite. Rar includes a VM to support custom data
+transformations to improve data redundancy, and thus improve compression
+ratios. However, it also represents a widely deployed machine architecture
+about which very little is known...that is just too tempting a target for
+exploration to ignore :-)
+
+Currently two basic tools are available for experimentation, a linker and an
+assembler. A dissassembler will be available soon, and perhaps eventually a
+compiler (in the form of a llvm backend or gcc target).
+
+Usage
+===============================================================================
+
+There are five types of files referenced in this document, they are explained below.
+
+ * .rs : A RarVM assembly program.
+ * .rh : A RarVM header file.
+ * .ro : A RarVM object file.
+ * .ri : A CPP preprocessed RarVM assembly file, i.e. `cpp < input.rs > input.ri`
+ * .rar : A RarVM program.
+
+Recipes for GNU Make might look like this:
+
+ %.ri: %.rs
+ $(CPP) < $< > $@
+
+ %.ro: %.ri
+ $(RARAS) -o $@ $<
+
+ %.rar: %.ro
+ $(RARLD) $< > $@
+
+You can use C macros and includes if you wish, the assembler understands cpp
+directives and ignores them, so simply pipe your program through cpp before
+assembling it, producing a .ri file.
+
+Architecture
+===============================================================================
+
+Familiarity with x86 (and preferably intel assembly syntax) would be an advantage.
+
+RarVM has 8 named registers, called r0 to r7. r7 is used as a stack pointer for
+stack related operations (such as push, call, pop, etc). However, as on x86,
+there are no restrictions on setting r7 to whatever you like, although if you
+do something stack related it will be masked to fit within the address space
+for the duration of that operation.
+
+A RarVM program may execute for at most 250,000,000 instructions, at which
+point it will be terminated abnormally. However there are no limits on the
+number of programs included in a file, so you must simply split your task into
+multiple 250M cycle chunks. If the instruction pointer ever moves
+outside the defined code segment, the program is considered to have completed
+successfully, and is terminated normally.
+
+Your program will execute with a 0x40000 byte address space, and has access to
+3 status flags : ZF (Zero), CF (Carry) and SF (Sign) which can be accessed via
+the conditional branch instructions, or via pushf/popf (as on x86).
+
+Operands.
+-------------------------------------------------------------------------------
+
+The operand types supported for RarVM are as follows
+
+ * Registers (`r0, r1, ..., r7`)
+ * Memory references (`[#0x12345]`)
+ * Register indirect references (`[r0]`)
+ * Base + Index indirect references (`[r4+#0x1234]`)
+ * Immediates (`#0x12312`)
+
+And as a convenience to programmers, the assembler also supports symbolic
+references that are resolved to immediates at compile time, simply prefix them
+with $.
+
+ * Symbolic references (`jmp $next_loop`)
+
+So a trivial demonstration program might help.
+
+---
+
+#include <constants.rh>
+#include <crctools.rh>
+#include <math.rh>
+#include <util.rh>
+ ; vim: syntax=fasm
+
+
+ ; Test RAR assembly file that just demonstrates the syntax.
+ ;
+ ; Usage:
+ ;
+ ; $ unrar p -inul helloworld.rar
+ ; Hello, World!
+ ;
+
+ _start:
+ ; Install our message in the output buffer
+ mov r3, #0x1000 ; Output buffer.
+ mov [r3+#0], #0x6c6c6548 ; 'lleH'
+ mov [r3+#4], #0x57202c6f ; 'W ,o'
+ mov [r3+#8], #0x646c726f ; 'dlro'
+ mov [r3+#12], #0x00000a21 ; '!\n'
+ mov [VMADDR_NEWBLOCKPOS], r3 ; Pointer
+ mov [VMADDR_NEWBLOCKSIZE], #14 ; Size
+ call $_success
+
+---
+
+Rar programs can either supply their own initial register values, or use the
+default set which includes some system information. If you do not specify your
+own registers, these will be set:
+
+ * r0: zero
+ * r1: zero
+ * r2: zero
+ * r3: Address of Global Memory Buffer
+ * r4: Filter Block Length
+ * r5: Filter Exec Count
+ * r6: zero
+ * r7: End of available memory (and thus the stack grows down on RarVM).
+
+Flags are always initialised to zero.
+
+Currently I havn't invented a syntax for describing your own initial registers,
+so you're stuck with these. Maybe a section called .regs?
+
+ section .regs
+ dd 0
+ dd 0
+ dd 0
+
+Then perhaps a .data section for encoding the leading data bytes rar supports.
+If you want to produce output, you must write a pointer to your data into the
+global memory section, and record the size. See the helloworld.rs in the test
+directory for an example.
+
+In general, symbols starting with '`_`' are defined or reserved by the system for
+special use in the standard library or internally. Symbols starting with '`__`'
+are used internally and should not be called by you.
+
+The magic symbol "`_start`" is important, and is considered the entrypoint for
+your program, so you must define it (Rar has no concept of entrypoint, I
+emulate it with an implicit jump at startup).
+
+Stdlib
+-------------------------------------------------------------------------------
+
+A collection of useful routines is included in the stdlib directory, you can
+include these files and call the routines they define. For example, there is no
+way to get the remainder of a division operation in rar, so a software
+approximation is defined in math.rh:
+
+ #include <math.rh>
+
+ push #2
+ push #123
+ call $_mod
+
+Calling conventions.
+
+I use the following conventions, feel free to ignore them if you prefer
+something hideous like pascal.
+
+ * No registers are saved by callee, except r6 and r7.
+ * The return code is placed in r0.
+ * Parameters are passed cdecl style (reverse order) on the stack.
+ * r6 is used as a frame pointer, and you're required to preserve it.
+
+See the standard library for examples.
+
+Known Bugs
+===============================================================================
+
+There are several known bugs in the RarVM.
+
+`[redacted as some have security consequences]`
+
+Status
+===============================================================================
+
+This is a list of what currently is believed to work:
+
+ * Basic programs appear to run on the Linux unrar program.
+ * I havn't tested Windows yet, because I'm still porting the CRC tools to RarVM.
+
+FAQ
+===============================================================================
+
+Q. Can I produce output from my program?
+
+A. Yes, but the answer is complicated.
+
+ Recall that the RAR format requires a checksum for the output in order for
+ it to successfully process your program, so you must pad your output to this
+ CRC in order to complete. A template has been included to demonstrate
+ this technique at runtime based on CRC tools by Julien Tinnes.
+
+ This requires crc padding to occur at exit, which involves adding a 32bit
+ value to the output at a specific location.
+
+ Yes, this is complicated, but you're programming WinRAR, what did you
+ expect? The ridiculous constraints are what makes it interesting ;-)
+
+Q. Can I supply input to my program?
+A. All input must be included in the archive, there is no way to provide
+ external input at runtime.
+
+Q. Can I write self-modifying Code?
+A. I don't think so, you can think of the code as occupying a different x86
+ segment from your data, so is not addressable. So, in x86 terms, you might
+ say ds = ss, but cs != ds.
+
+Q. Can I embed data within code?
+A. You could, using opaque predicates, but as you cannot read from the code
+ segment, I don't think it would be useful.
+
+Q. How do I force the use of smaller integer encodings to save space?
+A. You send me a patch to do that.
+
+Q. How do I set the "Init Registers" option rar supports?
+A. You send me a patch to do that.
+
+Q. How do I add data to the "InitData"?
+A. You send me a patch to do that.
+
+Q. Whats the deal with the different branch types?
+A. I'll add support when I can.
+
+Q. What does "print" do?
+A. I have no idea! Does it do something on Windows? (please test and report)
+
Oops, something went wrong.

0 comments on commit 7521a76

Please sign in to comment.