Why not go with 64 bytes and just scrap the whole RAM idea XD. Though if you are pipelining like you wish, I do suggest 64 bytes. While you may not fill it up, the peice of mind along with the fact that 16 could be too little.
Perhaps 32 bytes? Happy medium?
I decided to go with 16 because it turns out that the total circuit time is 18 ticks, and so I will always have 2 registers without data in the loop. Also it's a convenient size
And i thought i was good at redstone...... i talk in derp mode when pipeline CPUs and data bit RAM loop things come into the conversation.........btw your a good bragger
And i thought i was good at redstone...... i talk in derp mode when pipeline CPUs and data bit RAM loop things come into the conversation.........btw your a good bragger
I've just completed the timing portion of the reads, writes, and ALU. Now I must work on the instruction circuitry: Programming the operation decoder, timing the input and output data bus, finding an effective way of incorporating jumping into the programs, and having a variable clock that can be switched between 3 and 18 ticks. With the upcoming school week I may not have time to make significant progress until the weekend. Thanks for all your guys's comments, its great to have this kind of support!
I've come across something interesting while testing the cpu so far. (I've performed tests on non-cpu circuits and this is still true). Uncontrolled circuits will eventually become asynchronous after some time. Initially the difference is negligible but after about 10-20 repetitions the difference is on the order of ticks. I hope this is fixed in the upcoming 1.5 update, but if it isn't its effects shouldn't be of much concern to my project. I've tested this in singleplayer and multiplayer. I will discuss it in my next update video.
Do you plan to build a branch predictor and dedicated branch memory?
I've made a 3 stage pipeline before (fetch --> exe --> address and save) and the biggest problem I encountered was that every register needed a dedicated sister register to store the data while executing branches.
If the system encountered a branch it'd instantly stop updating the sister registers and assume a jump. If 2 clock ticks down the pipeline it discovered that the jump was correct it'd start updating the sister registers as usual. If the jump was incorrect it'd update the main registers to the value of the sister registers and resume from there (Data was restored to the point of the jump, so in essence nothing has changed)
The main problem was RAM updating. I had only 2 clock ticks of delay between jump and nonjump, so I only needed 1 potential RAM value filled. So that meant 1 address sister register and 1 RAM value sister. And even then the system was slow to update the RAM to the correct values. In your case however it could get quite messy...
While I don't plan to use this cpu for programs that require frequent branching I do have a plan to anticipate branching. One of these is simply counting the number of times i've iterated a section of code and branching when I reach the target (good for multiplication and division etc.) Another is using test numbers from a table to see if a number falls within parameters before it is used in a calculation and storing the test results to a register for reference when the branching step comes about. One particularly easy to anticipate flag is the carry out since it is calculated 2 ticks before the output of the alu, giving me enough time to halt the pipeline if needed. But again, I will largely be using this alu to perform operations on large sets of data in order to do things like line drawing and second and third order arithmetic.
Ok so after a few days of pondering, i've settled on what will be in my machine code:
2 4 bit read addresses. 8 bits total
5 bit write address with 2 bits to control the actual writing. 7 bits total
4 bits for ALU operations
6 bits IF and 6 bits ELSE for branching. 12 bits total. This implies 64 lines of code. The total delay across the ROM will therefore be 6 ticks
4 bits for flags. OR across the ALU, COUT, CLEAR REGISTER, and UI
3 bits for output registers, hence there will be 8 output registers
3 bits for input registers, hence there will be 8 input registers
1 bit to determine if pipeline or not (3 tick clock or 18 tick clock)
The functions on the program memory will be flexible and have some portions that can be edited. Specifically, the read and write addresses will be able to be incremented from the starting address which will also be determined in assembly. This means that every set of data will have 2 numbers attached to it: the address of the 0th piece of data in the set, and the number of pieces of data in the set.
External Hardware is still being worked on, but accommodating any changes should not be an issue because of the amount of output registers and input registers.
I've finally made the inputs to the CPU and as soon as 1.5 comes out I will continue my series of videos.
I also do plan on making the processor dual core, since this core will specialize in batch processing. The other will specialize in two number operations and less graphics oriented processes. As with most multicore processors, both will have separate first layer caches but will share second layer and I/O busses.
And I have almost finished the CPU. After I have coded in the functions I will begin working on the whole computer. It should be a fantastic graphics processor.
Dude, nice work! You should hook it up to a seven-segment display and use number cards to display the numbers for easy-to-read calculations. Cool stuff.
Edit:
Sorry about that, back then I was way too into seven-segment displays, but now I never use them because they are so useless.
Dude, nice work! You should hook it up to a seven-segment display and use number cards to display the numbers for easy-to-read calculations. Cool stuff.
A single 7-segment display would be offensive when attached to this, considering it's main purpose is graphics processing. I'm guessing it's going to use a color map display, but can't be certain. It does look like it would fit nicely under one.
ive been working on my own 4 bit pipelined alu and suspisiously its on a 3 tick pulse... but 0 ticks of delay on the busses and the fastest ive run it was 5 ticks
I decided to go with 16 because it turns out that the total circuit time is 18 ticks, and so I will always have 2 registers without data in the loop. Also it's a convenient size
Sweet job
hahaha thanks XD
You should probably call a doctor
I've just completed the timing portion of the reads, writes, and ALU. Now I must work on the instruction circuitry: Programming the operation decoder, timing the input and output data bus, finding an effective way of incorporating jumping into the programs, and having a variable clock that can be switched between 3 and 18 ticks. With the upcoming school week I may not have time to make significant progress until the weekend. Thanks for all your guys's comments, its great to have this kind of support!
If you're interested there are many people who would love to teach you on the RDF
While I don't plan to use this cpu for programs that require frequent branching I do have a plan to anticipate branching. One of these is simply counting the number of times i've iterated a section of code and branching when I reach the target (good for multiplication and division etc.) Another is using test numbers from a table to see if a number falls within parameters before it is used in a calculation and storing the test results to a register for reference when the branching step comes about. One particularly easy to anticipate flag is the carry out since it is calculated 2 ticks before the output of the alu, giving me enough time to halt the pipeline if needed. But again, I will largely be using this alu to perform operations on large sets of data in order to do things like line drawing and second and third order arithmetic.
Yey! Can't wait to see you in game again!
2 4 bit read addresses. 8 bits total
5 bit write address with 2 bits to control the actual writing. 7 bits total
4 bits for ALU operations
6 bits IF and 6 bits ELSE for branching. 12 bits total. This implies 64 lines of code. The total delay across the ROM will therefore be 6 ticks
4 bits for flags. OR across the ALU, COUT, CLEAR REGISTER, and UI
3 bits for output registers, hence there will be 8 output registers
3 bits for input registers, hence there will be 8 input registers
1 bit to determine if pipeline or not (3 tick clock or 18 tick clock)
The functions on the program memory will be flexible and have some portions that can be edited. Specifically, the read and write addresses will be able to be incremented from the starting address which will also be determined in assembly. This means that every set of data will have 2 numbers attached to it: the address of the 0th piece of data in the set, and the number of pieces of data in the set.
External Hardware is still being worked on, but accommodating any changes should not be an issue because of the amount of output registers and input registers.
I also do plan on making the processor dual core, since this core will specialize in batch processing. The other will specialize in two number operations and less graphics oriented processes. As with most multicore processors, both will have separate first layer caches but will share second layer and I/O busses.
Now it is
And I have almost finished the CPU. After I have coded in the functions I will begin working on the whole computer. It should be a fantastic graphics processor.
Edit:
Sorry about that, back then I was way too into seven-segment displays, but now I never use them because they are so useless.
A single 7-segment display would be offensive when attached to this, considering it's main purpose is graphics processing. I'm guessing it's going to use a color map display, but can't be certain. It does look like it would fit nicely under one.