aboutsummaryrefslogtreecommitdiff
path: root/doc/misc/layers.txt
blob: a02e4810afa72e70489a0e3067dcf79ce37c120d (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
pdp 0.13 design layers + components
-----------------------------------

from version 0.13 onwards, pdp is no longer just a pd plugin but a 
standalone unix library (libpdp). this documents is an attempt to 
describe the design layers.

A. PD INTERFACE
---------------

on the top level, libpdp is interfaced to pd using a glue layer which 
consists of

1. pdp/dpd protocols for pd
2. process queue
3. base classes for pdp/dpd
4. some small utility pd objects
5. pd specific interfaces to part of pdp core
6. pdp_console
7. pd object interface to packet forth (pdp object)


1. is the same as previous versions to ensure backwards compatibility in 
pd with previous pdp modules and extensions that are written as pd 
externs or external libs. this includes parts of pdp that are not yet 
migrated to libpdp (some of them are very pd specific and will not be 
moved to libpdp), and pidip. if you intend to write new modules, it is 
encouraged to use the new forth based api, so your code can be part of 
libpdp to use it in other image processing applications.

2. is considered a pd part. it implements multithreading of pdp inside 
pd. multithreading is considered a host interface part, since it usually  
requires special code.

3. the base classes (pd objects) for pdp image processing remain part of 
the pd<->pdp layer. the reason is the same as 1. a lot of the original 
pd style pdp is written as subclasses of the pdp_base, pdp_image_base, 
dpd_base and pdp_3dp_base classes. if you need to write pd specific 
code, it is still encouraged to use these apis, since they eliminate a 
lot of red tape involving the pdp protocol. a disadvantage is that this 
api is badly documented, and the basic api (1.) is a lot simpler to 
learn and documented. 3dp is supposed to merge to the new forth api, 
along with the image/video processing code.

4. is small enough to be ignored here

5. includes interfaces to thread system and type conversion system + 
some pd specific stuff using 1. or 3.

6. the console interface to the pdp core, basicly a console for a 
forth like language called packet forth which is pdp's main scripting 
language. it's inteded for develloping and testing pdp but it can be 
used to write controllers for pd/pdp/... too. this is based on 1.

7. is the main link between the new libpdp and pd. it is used to 
instantiate pdp processors in pd which are written in the packet forth. 
i.e. to create a mixer, you instantiate a [pdp mix] object, which would 
be the same as the previous [pdp_mix] object. each [pdp] object creates 
a forth environment, which is initialized by calling a forth init 
method. [pdp mix] would call the forth word init_mix to create the local 
environment for the mix object. wrappers will be included for backward 
compatibility when the image processing code is moved to libpdp.


B. PDP SYSTEM CODE
------------------

1. basic building blocks: symbol, list, memory manager
2. packet implementation (packet class and reuse queue)
3. packet type conversion system
4. os interface (X11, net, png, ...)
5. packet forth
6. additional libraries


1. pdp makes intensive use of symbols and lists (trees, stacks, queues). 
pdp's namespace is built on the symbol implementation. a lot of other 
code uses the list

2. the pdp packet model is very simple. basicly nothing more than 
constructors (new, copy, clone), destructors (mark_unused (for reuse 
later), delete). there is no real object model for processors. this is a 
conscious decision. processor objects are implemented as packet forth 
processes with object state stored in process data space. this is enough 
to interface the functionality present in packet forth code to any 
arbitrary object oriented language or system.

3. each packet type can register conversion methods to other types. the 
type conversion system does the casting. this is not completely finished 
yet (no automatic multistage casting yet) but most of it is in place and 
usable. no types without casts.

4. os specific modules for input/output. not much fun here..

5. All of pdp is glued together with a scripting language called packet 
forth. This is a "high level" forth dialect that can operate on floats, 
ints, symbols, lists, trees and packets. It is a "fool proof" forth, 
which is polymorphic and relatively robust to user errors (read: it 
should not crash or cause memory leaks when experimenting). It is 
intended to serve as a packet level glue language, so it is not very 
efficient for scalar code. This is usually not a problem, since packet 
operations (esp. image processing) are much more expensive than a this 
thin layer of glue connecting them.

All packet operations can be accessed in the forth. If you've ever 
worked with HP's RPN calculators, you can use packet forth. The basic 
idea is to write objects in packet forth that can be used in pd or in 
other image processing applications. For more information on packet 
forth, see the code (pdp_forth.h, pdp_forth.c and words.def)

6. opengl lib, based on dpd (3.) which will be moved to packet forth 
words and the cellular automata lib, which will be moved to 
vector/slice forth later.


C. LOW LEVEL CODE
-----------------

All the packet processing code is (will be) exported as packet forth 
words. This section is about how the code exported by those words is 
structured.

C.1 IMAGE PROESSING: VECTOR FORTH

Eventually, image operations need to be implemented, and in order 
to do this efficiently, both codewize (good modularity) as execution speed 
wize, i've opted for another forth. DSP and forth seem to mix well, once 
you get the risc pipeline issues out of the way. And, as a less rational 
explanation, forth has this magic kind of feel, something like art.. 
well, whatever :)

As opposed to the packet forth, this is a "real" lowlevel forth 
optimized for performance. Its reason of being is the solution of 3 
problems: image processing code factoring, quasi optimal processor 
pipeline & instruction usage, and locality of reference for maximum 
cache performance. Expect crashes when you start experimenting with 
this. It's really nothing more than a fancy macro assembler. It has no 
safety belts. Chuck Moore doctrine.. 

The distinction between the two forths is at first sight not a good 
example of minimizing glue layers. I.e. both systems (packet script 
forth and low level slice forth) are forths in essence, requiring 
(partial) design duplication. Both implementations are however 
significantly different which justified this design duplication.

Apart from the implementation differences, the purpose of both languages 
is not the same. This requires the designs of both languages to be 
different in some respect. So, with the rule of "do everything right 
once" in mind, this small remark tries to justify the fact that forth != 
forth.

The only place where apparent design correspondence (the language model)
is actually used is in the interface between the packet forth and the 
slice forth. 

The base forth is an ordinary minimal 32bit (or machine word 
lenght) subroutine threaded forth, with a native code kernel for 
i386/mmx, a portable c code kernel and room for more native code kernels 
(i.e i386/sse/sse2/3dnow, altivec, dsp, ...) Besides support for native 
machine words bit ints and pointers, no floats, since they clash with 
mmx, are not needed for the fixed point image type, and can be 
implemented using other vector instructions when needed), support for 
slices and a separate "vector stack". 

Vectors are the native machine vectors, i.e. 64bit for mmx/3dnow, 
128bit for sse/sse2, or anything else. The architecture is supposed to 
be open. (I've been thinking to add a block forth, operating on 256bit 
blocks to eliminate pipeline issues). Blocks are just blocks of vectors 
which are used as a basic loop unrolling data size grain for solving 
pipeline operations in slice processing words.

Slices are just arrays of blocks. In the mmx forth kernel, they can 
represent 4 scanlines or a 4 colour component scanline, depending on how 
they are fed from packet data. Slices can be anything, but right now, 
they're just scanlines. The forth kernel has a very simple and efficient
(branchless) reference count based memory allocator for slices. This 
slice allocator is also stack based which ensures locality of reference: 
a new allocated slice is the last deallocated slice.

The reason for this obsession with slices is that a lot of video 
effects can be chained on the slice level (scanline or bunch of 
scanlines), which improves speed through more locality of reference. In 
concreto intermediates are not flushed to slower memory. The same 
principles can be used to do audio dsp, but that's for later.

The mmx forth kernel is further factored down following another 
virtual machine paradigm. After doing some profiling, i came to the 
conclusion that the only, single paradigm way of writing efficient 
vector code on today's machines is multiple accumulators to avoid 
pipeline stalls. The nice thing about image processing is that it 
parallellizes easily. Hence the slice/block thing. This leads to the 
1-operand virtual machine concept for the mmx slice operations. The 
basic data size is one 4x4 pixel block (16bit pixels), which is 
implemented as asm macros in mmx-sliceops-macro.s and used in 
mmx-sliceops-code.s to build slice operations. The slice operations are 
built out of macro instructions for this 256bit or 512bit, 2 or 1 
register virtual machine which has practically no pipeline delays 
between its instructions.

Since the base of sliceforth is just another forth, it could be that 
(part of) 3dp will be implemented in this lowlevel forth too, if 
performance dictates it. It's probably simpler to do it in the lowlevel 
forth than the packet forth anyway, in the form of cwords.

C.2: MATRIX PROCESSING: LIBGSL

All matrix processing packet forth words are (will be) basicly wrappers 
around gsl library calls. Very straightforward.

C.3: OPENGL STUFF

The same goes for opengl. The difference here is that it uses the dpd 
protocol in pdp, which is more like the Gem way of doing things. The 
reason for this is that, although i've tried hard to avoid it, opengl 
seems to dictate a drawing context based, instead of an object based way 
of working. So processing is context (accumulator) based. Packet forth 
will probably get some object oriented, or context oriented feel when 
this is implemented.