This project was made in the memory of my old computer that I played around with as a young boy. I have a lot to thank this machine for, among other things it made me understand what I wanted to do with my life. So in this project I created software and hardware to make it possible to play those wonderful games yet again.

memwa_8memwa_7 memwa_6memwa_10memwa_9memwa_14memwa_13memwa_17memwa_15memwa_16memwa_5memwa_4 memwa_1memwa_3


I always wanted to do an emulator for Commodore 64 and I finally pulled me together to get one up and running. It is not flawless in any way but it is working quite decent looking at the hours I spent on it and given my almost non existing knowledge about the HW beforehand. Many short cuts were also needed in order to get the emulation running for all components utilizing as few clock cycles as possible. Full optimization, in lined code and code running in RAM. These are some examples to help with saving clock cycles and make the emulation acceptable. Also, as you may know, writing fast code is almost never conformable with code that have a nice structure and is pleasing for the eye. When making this project I was very exited and amused to see the progress. Since seeing this progress was the only thing that kept me going sometimes, the code was written quite fast and the quality was affected in a negative way.

Most of software was developed using Visual Studio. The reason for this is that Visual Studio is the only program from Microsoft that I like and it is very easy to debug the software if compared to on-chip debugging for the Memwa board. When running on Visual Studio on a PC the sound, unlike when using Memwa board which have the actual SID chip installed, needs to be emulated. This is done using the reSID C++ library.

The only piece code that is downloaded to Memwa board and that is not written by me is the Solomon-Reed algorithms to handle the MLC flash memory. So thanks to whomever wrote this code.


Emulating is very difficult and the more accurate the emulation is the more performance it will consume. Ideally, you want hardware powerful enough to emulate all components and still have cycles over in order to render the screen at exactly 50 FPS. In the case where the execution is too fast, you will always have the option to stop it for a while.


An MCU running at 235Mhz is not enough to emulate commodore 64 according to me. In fact, running with the software created in this project can only render at about 20 FPS give or take depending on the functionality that is used for the different components (e.g. graphic mode, number of timers etc.). Also, simultaneous emulation of these many components will make the timing fluctuate a lot, making a mess to the sound and movement.


A simple load balancer was created as a remedy for the fluctuations of performance and ensured that the frame rate was constant at about 50 FPS. The load balancer will balance the load by looking at two factors, time and lines rendered where time is naturally the time it takes for each frame to be displayed. When a line is rendered all pixels for this line are calculated individually using a set of conditions determined by the VIC component. This is very time (cycle) consuming. So by altering the number of lines that should be rendered at each frame its possible to balance the load and get a correct frame-rate (assuming that the hardware is never capable to render all lines needed at the correct FPS).

So why not only render the pixels or lines that was changed in previous frame?
To get a constant frame rate using this technique is very hard, but I think it is possible. I tried this two times with two different implementations but was not successful and got problem when whole screen was updated (e.g. memory pointer changed), when border color changed etc. I also tried this together with the load balancer which I think is the correct approach but still major problems getting stable FPS.



Illegal opcodes not verified and SID chip is not emulated.


The CPU of the commodore 64 is emulated as a 6502. The actual CPU is of another model (6510) but it does not matter much. The emulation mimics all the different assembler instructions used by 6502 to give same result using the STM32F4 MCU. The CPU is based on 8 bits so the instructions are limited to 256. Since it is specified exactly how many commodore 64 cycles one instruction takes, the CPU component is working as a clock to the other components in the system.


The graphics is a little complicated and to do an exact emulation of the vic chip would be very time consuming, hard and demanding a lot performance of the HW. This is why I took some short cuts with this component making the graphic less than perfect, but this is a reasonable tradeoff for getting something working with as little effort as possible. The VIC chip is so complex and flexible that it is hard to set any fixed specification. You can make a lot of tricks with this device. Having said this, the VIC chip, on paper, supports a resolution of 320×200 (not including borders) together with 8 sprites. The sprites are blocks of 24×21 (twice the size if expanded) that can be positioned and controlled individually. VIC can also control the collision between the sprites and other sprites or background. It supports a number of different modes when it comes to graphics.


These peripheral components mainly handles timers (CIA1 and CIA2), keyboard input (CIA1) and the serial port (CIA2) on commodore 64.


This component emulates a datasette device.


These peripheral components mainly handles timers (VIA1 and VIA2), serial port (VIA1) and the disk drive controller (VIA2) on the 1541 disk drive.


Emulating the SID chip would take humonggously many cycles, which I do not have. The HW SID chip itself will be used, so no emulation for sound.


This component handles all the memory in the 1541 disk drive. It works the same as the BUS component.


This component handles all the memory in the commodore 64. It provides an interface to read and write to the memory. It will take care of the connection between different components and make the necessary modifications for the surrounding hardware configuration. For example, the connection between the disk drive and commodore 64 is done using a serial port. The register for the former is located at address 0x1800 in VIA1 and for the latter at 0xDD00 in CIA2. The HW lines between VIA1 and CIA2 are affected by HW using pull up, pull down and inverters etc. Also the registers look very different. To take one example the attention line is connected to VIA bit 7, but is connected at CIA2 bit 3. When some component is using the interface for the BUS to modify memory, all other components connected to this memory (directly or by HW lines) will get this information by a simple callback.


Language C

Compiler GCC ARM Embedded

Development platforms CooCOX IDE, Visual Studio

Supported formats

Files that are supported:

T64, TAP, D64

Good stuff to know

The commodore 64 has a number of ROM sections and as the name suggests also have 64Kb of RAM. Since the computer uses 16-bit address bus the maximum number of bytes that can be addressed are therefore 64Kb. It handles the shortage with the use of banks and switching them in and out. This is done by manipulating the first two addresses (0x0000 and 0x0001). Regardless of the bank setup the VIC always see certain memory at fixed location (0x1000-0x2000 and 0x9000-0xA000 will always hold the character ROM). If writing to an address where ROM is switched in will result in a write operation for the same address but in the RAM “underneath”.


Hardware is really fun! I enjoy doing PCB designs for projects like these. This is why I decided to create a board for this project to see if I could produce something that would suffice to run the emulation good enough. The SID chip was the crown jewel for this board, no doubt about it 🙂

The pitch for the HW chip was sometimes below 0.5 which made soldering a bitch to be honest. But with some patience, sweat and a lot of flux it was indeed possible.


MCU: STM32F407 (overclocked 67Mhz giving 235Mhz)

Display controller: SSD1963

Flash Memory: H27UBG8T2BTRBC

Display: 7″ TFT 4-WRT


This handy little 100 pin MCU is running at max 168Mhz. This MCU seems stable at 235Mhz with internal clock and without cooling. MCU was used with FSMC. This made the communication with the graphic chip and flash memory very easy and necessary to get neat performance.

Display controller

A SOLOMON SYSTECH 1963 chip was used to handle the 7″

Flash Memory

A MLC 4Gb memory to hold about 30k C64 games. Being an MLC memory it contains a lot of bit errors per default and also some bad sectors. In this project I have ignored the bad sectors (will get led indication if bad sector is found when loading a game). The bit errors on the other hand cannot be ignored and must be corrected using some ECC algorithm. Solomon-Reed algorithm is used in this project and is configured so it can correct 5 bit errors for every 256 bytes read. Can be found at ebay.


This display has 800×480 resolution and comes with 4-wire touch panel. Can be found at techtoys.


In order to make the initial prototype, sockets for the major part was needed in order to test.

Solomon 1963 EVK (techtoys)


QFP100 socket for MCU (waveshare)


TSOP48 socket for flash memory (ebay)




Design Software – CadSoft’s EAGLE PCB Design Software

Debugger – Olimex ARM-USB-TINY-H

Hardware design

I made the design using 6 layer PCB with the same dimensions as the popular Raspberry-Pi board. I liked the small dimensions and the different cases that are available for raspberry. The board was manufactured using a fab called PCBCart, which I only have good things to say about.


Game Image

To make use of the flash memory, and image was needed to be created that will hold all the games and also the roms needed for the C64 and disk drive (kernel rom, basic rom, character ROM and 1541 kernel rom). I made a simple program that took all games I had in a specific folder (~30k games), made a TOC of these games 64byte aligned and created an image based on this. Image is about 4Gb large and will be transferred to the board using com port (yes, it will take some time) together with another simple utility program I made.


memwa board gerber files
memwa display board gerber files
memwa windows binary
memwa source code
memwa game bundle image (4Gb)
memwa utilities

One thought on “MEMWA 1

  1. Can I ask you something? How does emulators on microcontrollers compare to original hardware in terms of input lag (milliseconds)? Various forms of lag are a great issue for software emulators (check out, e.g. this thread, is it any better here since you don’t have to deal with the full OS sitting on top?

    BTW, have you ever tried programming the Raspberry Pi bare-metal? See:

Leave a Reply

Your email address will not be published. Required fields are marked *