notes/thread_sync.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45


Thread synchronization
======================

Alternative solutions to traditional lock + memory ordering:

Use a memory area for each "channel endpoint"
---------------------------------------------
- Each channel endpoint has ONE sending/producing thread.
- Each channel endpoint has ONE receiving/consuming thread.

How does this scale? With 4 KiB pages and ALL threads communicating with
each other (worst case):

    Threads     Memory usage    = n*(n-1)*4KiB
        4           48 KiB
        8          224 KiB
       16          960 KiB
       32        3 968 KiB
       64       16 128 KiB
      128         63.5 MiB
      256        255.0 MiB   <-- here it starts getting really problematic
      512          1.0 GiB
     1024          4.0 GiB
     2048         16.0 GiB
     4096         64.0 GiB

With 32 byte queues (one cache line):

    Threads     Memory usage    = n*(n-1)*32
        4          384 B 
        8         1792 B
       16         7680 B
       32           31 KiB
       64          126 KiB
      128          508 KiB
      256         2040 KiB
      512          8.0 MiB
     1024         32.0 MiB
     2048        128.0 MiB
     4096        512.1 MiB

Another problem is that the threads need to check one queue per sender.
That's a lot of queue to loop through.
(Unless it is possible to send "messages" between CPU threads)