DASL (Dynamic Application-Specific Logic) refers to a type of computer I originally described in a previous post. This post uses a simple program to demonstrate how a DASL machine can execute the a program much faster and much more efficiently than a traditional CPU-based computer.
Say you want to get the average of a list of numbers from a column in a table.
A typical CPU-based computer might load one number from main memory into a CPU register (a small piece of memory inside the CPU itself), read the next number into another register and add them, placing the result into one of the initial registers. This would repeat until all numbers are added. Along the way it might keep a count of how many numbers have been added another register, and when all numbers have been added, it would divide the sum by the count and move the output of the division to main memory.
Something like this:
LDA $c3 ; load memory location $c3 into the accumulator
INX ; increment X register
ADC $c4 ; add memory location $c4 to the accumulator
INX
ACD $c5
INX
ACD $c6
INX
ACD $c7
INX
ACD $c8
INX
ACD $c9
INX
ACD $ca
INX
ACD $cb
INX
ACD $cc
INX
DIVX ; divide the accumulator by the value in the x
; register and leave the result in the accumulator
MOV $cd ; move the value in the accumulator to memory location $cd
Assembly language pseudocode to average a list of nine numbers.
Note: there are certainly better ways to do this, but I'll elaborate on that at the end of the post.
Assuming each instruction takes once clock cycle, a list of nine numbers takes at least twenty-two cycles.
A multiprocessor system could speed this up by splitting the list into smaller lists based on the number of CPUs available for the addition step and executing each subset on a separate processor. This could reduce execution time by something close to number of rows / number of CPUs (although there is additional cycles consumed to pre-process the data).
A DASL machine does something similar to the multiprocessor system, but instead of dividing the work across a fixed number of general-purpose CPU˙s, it can synthesize as many single-purpose ´processorsˇ as will fit in the machine. Each processor only contains enough logic to execute the required operation (in this case, addition and one for division) so these processors can be much smaller and much simpler than a general-purpose CPU.
In this example, the maximum number of operations that can occur simultaneously is four (the whole number result of input count divided by two), so the first stage consists of four addition "processors" taking two numbers each and one "do-nothing" processor (because the input does not split evenly). A second stage adds the output from the first stage, and on and on until a single value is yielded.
The resulting processor structure to sum the list looks something like this:

Once the list has been summed the total is divided by the list length by a single "division" processor (omitted from the diagram above for clarity).
In a CPU-based computer, you would expect each layer to consume at least one clock cycle but in a DASL machine the output of this processing structure is available as soon as the last input value is loaded. In essence, the DASL machine completes the entire program for all data in the equivalent of one of the traditional CPU˙s clock cycles.
In addition to reducing the cycles needed to perform the operations, the DASL machine eliminates the need to move data back and forth between registers and main memory. Only the initial input and final output need to leave the TUB die, all other memory used for the computation is embedded in the processor itself, akin to registers in a traditional CPU but arranged perfectly to suit the application. In fact the intermediary results need to be stored at all as they are fed directly into the logic gates of the subsequent stage.
Typical CPU's have special instructions to accelerate typical cases like this, but that doesn't contradict the point of this comparison. While this specific example could be accelerated a number of ways on a modern CPU, the CPU is still limited to the optimizations it leaves the factory with, both in terms of the operations available and the amount of data they can process. The DASL machine can synthesize a special "instruction" for any problem, not only the problems the manufacturer anticipates and it can implement these instructions in a way that suits the data in highly-optimized ways that a static CPU implementation cannot. Furthermore, by using data locality to eliminate the need to move data in and out of the processor, the DASL machine spends much less time on non-computational activities, deferring slower input-output actions until the final result is obtained.