Altera has announced a family of in-system programmable complex PLDs (ISP CPLD). No doubt there are others available or on the way. (!! Xilinx parts have always been reprogrammable )
(For the benefit of those outside the electronics field) what this is is a device which can be programmed (like non-volatile RAM saving your PC BIOS settings), not just to store data but to do arithmetic, or any kind of logical operation possible with digital electronics. So you want to add three 15-bit numbers together. Fine - program a 15-bit 3-port adder to do the job in one clock cycle.
The Altera ISP devices are programmable through the JTAG input (an in-system diagnostic test port). They may be programmed in a variety of ways, such as using schematic editors or PLD programming language, but one of the more interesting is using VHDL (an open standard hardware definition language). VHDL is (very loosely) similar to Pascal or a high-level computer programming language, with many extensions to support timing, simultaneity, and hardware data types (bit, register). There are an increasing number of tools and models available, including a prototype IEEE floating-point library, 8085 processor and DLX 32-bit processor models. Proprietary models (will) include such things as a PCI interface. See e.g. wikipedia for more information.
Bringing these together, one can envisage the following system: A host (PCI PC, VME) with slot(s) containing board(s) with one or more ISP CPLDs or ISP FPGAs and probably some cache SRAM and DRAM. See http://andrew.triumf.ca/CPLD-CPU/CPLD-CPU.html . The bus interface would probably be implemented using the CPLD itself. To run a compute-intensive task, one would write the program in VHDL, using available math, PCI, VME etc. libraries as one would in C++, F90, etc. For debugging, one would use a VHDL simulator. Then one would compile the VHDL to the target CPLD (possibly to multiple chips) and download the code. The program would then run in the CPLD, either using its on-board memory or main memory via DMA.
Although programmable devices are inherently slower than custom silicon, the efficiency of the algorithm is the overriding factor in the final speed of a piece of software. So the CPLD is not going to rival a Pentium, say, at something the Pentium is designed to do, such as take a series of 32-bit numbers from memory, calculate the squares, and write the answers back. If, however, what you want is to calculate the area of 2300 triangles, the CPLD might be able to do that in 2305 clock cycles using a pipelined design. If you need to add 25 pairs of 6-bit numbers, it is possible to create 25 6-bit adders, taking one clock cycle to complete all 25 operations. The CPLD is suited to complex operations such as MPEG decoding or 3-D rendering, where the economics of quantity or time rule out a custom ASIC. Using an ISP CPLD, one can have a generic device with the power of a semicustom solution.
At present, programming is relatively slow as it is performed using a serial interface. Future technology may allow more rapid programming.
Possilbilities for the future include such things as a C++ to VHDL parallelizing compiler, and a computer made entirely from programmable hardware, that reconfigures itself to the tasks on hand. If the Pentium had been made using such a technology, the infamous floating point bug could have been fixed by a click to a Website. For a sketch of a possible architecture, see http://andrew.triumf.ca/CPLD-CPU/CPLD-processor.html .