blob: 496d5c1654614bc6e35830c46cf138d8daebb836 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
|
Thread synchronization
======================
Alternative solutions to traditional lock + memory ordering:
Use a memory area for each "channel endpoint"
---------------------------------------------
- Each channel endpoint has ONE sending/producing thread.
- Each channel endpoint has ONE receiving/consuming thread.
How does this scale? With 4 KiB pages and ALL threads communicating with
each other (worst case):
Threads Memory usage = n*(n-1)*4KiB
4 48 KiB
8 224 KiB
16 960 KiB
32 3 968 KiB
64 16 128 KiB
128 63.5 MiB
256 255.0 MiB <-- here it starts getting really problematic
512 1.0 GiB
1024 4.0 GiB
2048 16.0 GiB
4096 64.0 GiB
With 32 byte queues (one cache line):
Threads Memory usage = n*(n-1)*32
4 384 B
8 1792 B
16 7680 B
32 31 KiB
64 126 KiB
128 508 KiB
256 2040 KiB
512 8.0 MiB
1024 32.0 MiB
2048 128.0 MiB
4096 512.1 MiB
Another problem is that the threads need to check one queue per sender.
That's a lot of queue to loop through.
(Unless it is possible to send "messages" between CPU threads)
|