08:23:36 -!- The topic of #crawl-dev is: Crawl Development | https://github.com/crawl/crawl | Logs: http://s-z.org/crawl-dev/, temporarily http://crawl.akrasiac.org/logs/cheibriados/ | People with +v have commit access, devs on bridged discord as well | General Crawl-related chat to #crawl | Long stuff to a pastebin service, please 08:23:36 -!- The topic of #crawl is: Play Dungeon Crawl Stone Soup online now! Type ??online for instructions, ??lg / !lg for play stats | PM Sequell for long queries | http://crawl.develz.org | FooTV game replays: ??footv for instructions | #crawl-dev for dev discussion, #crawl-offtopic for offtopic 08:26:43 <09gammafunk> %git 08:26:43 gotta wait for it to sync all the changes since it went down 08:26:43 <04Cerebot> DracoOmega * 0.33-a0-1118-g28f6e29f1b: Fix being able to equip all shields on any size of character (kfkqmtg) (14 hours ago, 1 file, 5+ 4-) https://github.com/crawl/crawl/commit/28f6e29f1bab 08:27:30 <09gammafunk> oh 08:28:04 <09gammafunk> ok, looks like it's working again 08:28:11 <09gammafunk> yeah it was churning cpu so I was worried 08:28:50 <09gammafunk> seems to be working, but we'll have to keep an eye on it 08:28:53 optimized it isn't 🙂 08:29:35 I know advil did some speedups to the git code but there's still room for a lot of optimization in there 08:51:21 Who are the developers most comfortable with Apple on this project? I'm asking so I know who to look for in the past in the commit log. While my macOS PR cooks in review, I'm thinking about iOS now 08:52:01 I want to make sure I understand the history of how the apple-specific port was added so I don't undo someone else's work out of ignorance 08:57:47 <09gammafunk> Luigi: Someone made an ios port a couple years ago. But the apple licensing requirements for being listed on the store made it a non-starter. We're not willing to try to relicense/dual license the source code to be listed on the store 08:58:31 I am presuming individual enthusiasts would compile that target for their personal device, like the controversial Delta emulator did for years in the FOSS space 08:59:42 I figured store was likely a non-starter, but it's not like I'm trying to make any money off this work :p 09:01:04 This does however enable any random Joe with a business to upload it to the store themselves which may be a concern 09:01:15 But that's also the case with the macOS app right now 09:02:04 <09gammafunk> well, I think the people most involved with macos are around/active less frequently these days. advil is probably the best person to ask 09:21:22 Understood. I'll check out their commit history 10:58:36 <06mumra> I didn't test my emscripten build running on iOS, but assuming it does (safari supports everything it needs afaiaa) it can be packaged as a PWA and someone can install it as a home screen app bypassing any app store requirements. Worth testing anyway 11:07:57 <06mumra> If you have an iphone or ipad handy please give it a try in safari: https://crawl-webtiles-2.netlify.app/ It's certainly running nicely on my Android. It could do with a bit of UI tweaking to play a bit better on touchscreen but it is playable. (Just be aware, you have to wait quite a while on first load for the map cache to generate. We could add a progress bar there tbh) 11:11:32 <06mumra> (Url is maybe misleading, it's not like client/server webtiles, its running fully in the client. Plan is to make a server interface later so it can be a webtiles frontend, just using actual desktop rendering code instead of the completely different js renderer webtiles uses now) 11:12:44 <09gammafunk> I just tried it in my normal (linux) desktop browser. Very responsive! Very awkward if I leave fullscreen, since then I get scrollbars that don't work well with the game's still larger rendering area 11:12:49 <06mumra> Allowing it to be installed as a PWA should be just a matter of some configuration 11:13:38 <06mumra> Yeah changing canvas size is buggy in general right now, i just didn't look into how to fix it 11:14:09 <06mumra> To play in landscape i had to rotate my phone and then refresh to let it initialise at the correct resolution 11:14:50 <09gammafunk> I didn't test it for more than like 1.5 levels, but it certainly felt like I was playing Tiles locally 11:16:28 <06mumra> Yep, i was super impressed with how it turned out, and there was extremely little stuff i had to do any special casing for. File system handling is a bit weird but once i got there it "just worked" and uses the browser's indexeddb to persist between refreshes 11:16:43 <09gammafunk> hrm, one other weird bug is that my browser really didn't want to exit full screen after I exited the app (in full screen) 11:17:05 <09gammafunk> I had a fullscreen blank window that didn't respond to [esc] to leave fullscreen 11:17:19 <06mumra> Theres loads of stuff that could be optimised e.g. downloading assets asynchronously would be better, right now it basically downloads a 17mb data file for the file system and mounts that virtually 11:17:43 <09gammafunk> I guess currently for webtiles the only major assets are the tile sheets 11:17:50 <09gammafunk> you don't have to get all those game logic files as well 11:18:05 <09gammafunk> but if this did become a client, you'd similarly not be downloading all those files 11:18:12 <09gammafunk> also maybe you didn't advcomp those tilesheets 11:18:14 <06mumra> Hmm that sound like a peculiarity of your linux browser perhaps 11:18:26 <09gammafunk> well, it was a very recent chrome fwiw 11:19:12 <09gammafunk> I think I managed to trigger closing the tab with ctrl-w and then I was able to get a right click menu to exit full screen in a remaining tab 11:19:17 <06mumra> Sure, a bit chunk of the data is des files and language files. We also shouldn't package all the translations, just download on demand 11:20:41 <06mumra> Yeah thats really wierd, definitely didnt get in that situation on windows chrome. Shouldn't happen (i mean the client isn't doing anything that any other fullscreen webgl game wouldn't be doing) 11:36:33 lurklord (L5 GrEE) ASSERT(!in_bounds(you.pos()) || !cell_is_solid(you.pos()) || you.wizmode_teleported_into_rock) in 'movement.cc' at line 799 failed. (D:2) 11:49:00 <06mumra> We could also start sending packets in a packed binary format which i think would be a massive latency improvement over json (and would be rather more natural to implement in the codebase) ... As long as we can get client and server to agree on endianness 12:05:25 <09gammafunk> given that we deflate the json already for server->client, would it be that much of an improvement? 12:07:44 a lot of json parser/serializer support CBOR, I don't know what crawl uses but it could be a quick way to test an improvement 12:08:11 (but also, I'm not a regular contributor and I don't have much context so feel free to diregard anything I say) 12:34:26 <06mumra> Well there is network overhead and then there is CPU time in zipping/unzipping the messages 12:35:34 <06mumra> We can easily inspect the packets arriving jn webtiles to see how efficient the compression is (i still believe binary packing will be more efficient as the compression is not highly tuned for CPU reasons of course) 12:36:36 <04Cgettys> It's now equivalent to a Level 1 aux attack with 5 damage. If I dropped it down to say, 3 damage, is that less overly strong? Or do you think dropping the AC is a better choice? 12:36:49 <06mumra> Decompression blocks the main thread in the browser and compression will be blocking at the python layer possibly 12:37:35 <04Cgettys> Depends on the library? Also may depend on the python version (see newest python versions with GIL free modes) 12:37:36 <06mumra> We do know that there are lag spikes when a new spectator joins and the entire map has to be sent to them 12:40:42 <02Monkooky> at some point gonna give a more substantial comment on the PR page 12:42:45 <06mumra> I will also say, there's a reason google created protobuf (a packed binary format) for high throughput network applications where compressed json was already readily available and extremely well supported in the kind of systems consuming it 12:44:37 <06mumra> It has other things going for it but very small and efficient messages is one of its selling points 12:46:23 <04Cgettys> Really this seems like a case of "get a profiler on it" 12:46:34 <04Cgettys> see where the bottleneck is 12:46:38 <04Cgettys> easier said than done, I know 12:51:18 <04Cgettys> I look forward to it 🙂 12:56:34 <06mumra> Easier said than done when bottlenecks could be in C++, in the python server, or in the browser (or possibly all three) 🙂 But yes, useless trying to optimise without profiling 12:56:57 <04Cgettys> Well, that's why you need a system-wide low overhead profiler... 12:57:52 <04Cgettys> Of course, this is the sort of mythical thing that is well known to exist inside cloud hyperscalars or other large companies where they have debug symbols for everything they ship indexed somewhere 12:58:03 <04Cgettys> But is less easily found outside of there 12:58:09 where's dtrace when I need it? 12:59:31 <04Cgettys> There's a port of itsomewhere I thought 12:59:58 <04Cgettys> It's definitely still possible to do mind you 13:00:08 <04Cgettys> it's just less easy than it is in such a ecosystem 13:00:25 <04Cgettys> where it's literally "press button, trace collected, all symbols magically can be resolved" 13:01:16 <04Cgettys> Actually, geekosaur, am I getting my history backwards? 13:01:21 <04Cgettys> was it Linux/bsd first? 13:01:27 <04Cgettys> there's https://github.com/microsoft/DTrace-on-Windows now 😉 13:02:13 <04Cgettys> Ah, right, https://github.com/opendtrace/opendtrace came first 13:02:32 <04Cgettys> Or came before the Windows port anyway 13:03:30 Solaris, ported to freebsd, Linux port exists but its license is incompatible with the GPL 13:03:50 <04Cgettys> Ah, right, right 13:03:52 and not on windows at all of course 13:04:01 <04Cgettys> Well, that last bit isn't actually true any more 13:04:20 <04Cgettys> https://github.com/microsoft/DTrace-on-Windows exists now 🙂 13:05:41 huh 13:07:05 <04Cgettys> ebpf is also experimentally around these days: https://github.com/microsoft/ebpf-for-windows 13:09:38 actually I think macs still have it as well but it's crippled so for example you can't trace iTunes to get decrypted audio files 13:10:16 <04Cgettys> Right, software protection schemes 13:10:25 <04Cgettys> though more and more of that stuff mvoes towards hardware production 13:10:38 <04Cgettys> Some of it for better reasons than others 13:12:15 <04Cgettys> E.g. protecting iTunes... some people will be for, some against (my personal opinion irrelevant to this conversation) 13:12:29 <04Cgettys> Protecting your personal data from malware... generally a good thing 13:41:04 <09gammafunk> %git de2045381eb506fd555763a777b6d54c05f5a1f4 13:41:05 <04Cerebot> DracoOmega * 0.33-a0-673-gde2045381e: Remove all stat draining from the game (2 months ago, 42 files, 83+ 529-) https://github.com/crawl/crawl/commit/de2045381eb5 13:42:25 <09gammafunk> @dracoomega > -Summon horrible things now has a chance to give a temp Weak-Willed > mutation instead of int drain. This is... slightly silly and low-impact > (if thematically appropriately), but just a placeholder for a new type of > effect I plan to implement later. Do you still have plans to do anything for XXX's debuff for this release? Otherwise, I was going to update its description: > Opens a gate to the Abyss 13:42:25 and calls through two or more hideous abominations > from that dreadful place. The opening of the gate may drain a portion of the > caster's intellect. The number of abominations summoned increases with spell > power. 13:53:38 <04dracoomega> Did I really not change its description when I changed it? That was an oversight. (Yeah, Banes are definitely not getting into this version, and I didn't really have another 'stop-gap' change in mind for them) 13:57:21 <09gammafunk> ok, I'm pushing a cleanup right now then 14:04:46 03gammafunk02 07* 0.33-a0-1119-g2614225963: Fix a spell description (CarefulOdds) 10(5 minutes ago, 3 files, 4+ 6-) 13https://github.com/crawl/crawl/commit/261422596309 15:25:24 -!- wvc5 is now known as wvc 15:32:41 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1099-g556cab7ea1: nerf the damage some more 10(30 seconds ago, 1 file, 1+ 1-) 13https://github.com/crawl/crawl/commit/556cab7ea116 16:39:29 Unstable branch on underhound.eu updated to: 0.33-a0-1119-g2614225963 (34) 17:01:21 <06mumra> A quick and dirty bit of profiling the browser and digging into webtiles implementation ... * Taking a single step in the dungeon in a typical scenario (just exploring, nothing hectic going on, 1 spectator) - 40ms from pressing the a key to receiving a response message from the server. Then 50ms to update the view. 90ms is definitely a bit sluggish overall for a single interaction * Emscripten build - the redraw itself is 18ms, so it's 17:01:22 blazing fast by comparison (and before any serious optimisation has even been attempted). However quite a lot of time is spent in updating the game world but remember this is the entire game being run in the browser. So currently it ends up roughly the same overall for a single interaction. But this means using emscripten just to render the client and having game updates on the server looks like it'll definitely be faster already. 17:05:11 <06mumra> Then I dug into the python code a little. I hadn't realised earlier that compression is being manually handled from python, invoking zlib; i assumed it was the standard gzip compression normally used by webservers, but that's not actually supported in websockets. The inflation happening in the browser at least appears to be negligible, but I am seriously questioning the performance of using zlib here, especially as it appears to be 17:05:12 compressing the message separately per listener even though the same message is sent out to everyone (except for new spectator joining) it's getting compressed over and over, and it's not being done async so this is appears to be blocking the whole python thread until all clients have been updated 17:07:01 <06mumra> There is even this comment: python def handle_process_message(self, msg, send): # type: (str, bool) -> None # special handling for map messages on a new spectator: these can be # massive, and the deflate time adds up, so only send it to new # spectators. This is all a bit heuristic, and probably could use # a new control message instead... # TODO: if multiple spectators join at the same time, 17:07:02 it's probably # possible for this heuristic to fail and send a full map to everyone 17:08:04 <06mumra> Obviously webtiles was developed at a time when message sizes mattered a lot more vs internet speeds, I am actually wondering if these days it might just perform better with deflate disabled 17:10:07 <06mumra> The messages are generally not that large anyway, and i can find comments like these even from 8 years ago suggesting that perhaps it's a detrimental strategy https://www.reddit.com/r/gamedev/comments/76xqyv/comment/dohj1z8/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button 17:11:45 <06mumra> Further, there is more recently a compression extension for websockets supported in all the major browsers, which would perhaps also be more performant https://www.rfc-editor.org/rfc/rfc7692 17:12:44 <06mumra> Still, just natively building binary packets and unmarshalling on the other side is likely better than having to do any compression 17:13:50 <06mumra> Anyway still not drawing any definite conclusions around this, but would be very interested to test a webtiles server with deflate disabled at least to see if that 40ms response comes down at all 17:28:36 <06mumra> (oh, this testing was on underhound ... obviously server specs are going to have an impact if it turns out this part of the issue) 17:41:32 <04Cgettys> Compressing over and over definitely seems silly 17:42:20 <04Cgettys> Ah, interesting 17:42:23 <04Cgettys> # note: a compressed stream is stateful, you can't use this # compressobj for other sockets compressed = self._compressobj.compress(binmsg) 17:43:45 <06mumra> it'd be better to compress the message inside crawl code then send that to all the clients really 17:43:55 <04Cgettys> Was about to say something along those lines 17:44:06 <04Cgettys> Or even if it is not in crawl code, python is probably not the ideal tool for this 17:44:14 <04Cgettys> Or at least, CPython is probably not 17:44:26 <06mumra> still ... it might simply not be saving any overall time either way 17:45:44 <04Cgettys> Yeah, again depends on the servers 17:45:46 <04Cgettys> and on the clients 17:45:49 <06mumra> python is calling zlib so i assume the compression is as fast as possible 17:45:52 <04Cgettys> bandwidth is a lot cheaper these days, but... 17:45:54 <04Cgettys> ehhhh 17:45:57 <04Cgettys> depends on what zlib we're using 17:46:02 <04Cgettys> are we even using zlib-ng? 17:46:04 <04Cgettys> what settigns? 17:46:06 <04Cgettys> yadayadayada 17:48:02 <06mumra> well it's def not zlib-ng 17:48:12 <04Cgettys> Well yeah, probably not 17:48:19 <04Cgettys> But that's exactly my point 😉 17:55:20 <04Cgettys> Honestly, I'm tempted to see what I can do with PyO3/Maturin 18:03:53 <04Cgettys> Those are good looking compression ratios 18:03:57 <04Cgettys> 79@ 18:06:43 <06mumra> you mean that's the current compression? 18:07:24 <04Cgettys> roughly yeah 18:07:37 <04Cgettys> if you run webtiles locally or can look at the console output it has stas 18:07:50 <04Cgettys> 79% for a just started game 18:08:02 <04Cgettys> 84-ish% for Vaults 5 18:08:57 <04Cgettys> 2025-03-22 18:07:12,579 INFO: #21 Socket closed. (275.2 kilobytes sent, compression ratio 86.64%) Not sure how I got that big of a pcaket, most relaods seem to be 38kb, 20kb 18:09:58 <04Cgettys> oh, I see 18:10:01 <04Cgettys> stats are when closed 18:10:16 <04Cgettys> so a reload directly gives stats for just 1 packet 18:10:40 <06mumra> yeah a big message you'll get much more benefit from compression 18:10:55 <06mumra> but when i take a single step the message is only like 200-300 bytes 18:11:01 <04Cgettys> Sure, that bit I got, but the compression ratio seems to hold up overall 18:11:15 <04Cgettys> that's the tradeoff this compression scheme makes / why it's compressing independently for each client 18:12:12 <04Cgettys> Ratio actually is better for the player as opposed to the short-lived test spectators 18:12:13 <04Cgettys> 2025-03-22 18:11:49,834 INFO: #0 Socket closed. (709.3 kilobytes sent, compression ratio 91.47%) 18:12:16 <06mumra> like maybe it makes sense to compress bigger messages - but still, a bigger message is also going to take longer to compress 18:12:48 <04Cgettys> but that's exactly the point, it's not compressing the messages only 18:12:51 <04Cgettys> it's compressing the stream 18:12:57 <06mumra> and a message taking time to send is only affecting the client, not holding up the whole server while it sends 18:13:07 <04Cgettys> if you see similar small messages in a row, each gets compressed 18:15:20 The emscripten site is frequently blocking paints on desktop Safari while loading. Once initialized the application is flawless AFAICT but I am not an experienced player 18:16:51 The experience on iOS Safari is not similar. The canvas is not adapting to the layout and is instead approximately rectangular, with the UI elements cut off 18:17:22 Without a keyboard I do not seem able to navigate the iOS Safari application, though I'm unsure if this is intended 18:17:58 <06mumra> in portrait? yeah, crawl UI code has never been made responsive for that aspect 18:18:43 <06mumra> and yep, blocking paints during load is normal right now, it's a heavy process. could probably add some microsleeps in let paints thru 18:18:52 landscape causes the screen to overdraw as well 18:19:02 I can however now navigate the application 18:19:14 <06mumra> hmm ok, maybe an issue with pixel density on retina 18:20:11 <06mumra> i have a scaling issue with normal desktop crawl on a weird sized screen i have tbh that i didn't try to debug yet 18:21:16 <06mumra> on android i can navigate with touchscreen, although it's janky trying to scroll menus 18:21:45 <04Cgettys> 2025-03-22 18:19:21,621 INFO: #4 test started watching test (1). 2025-03-22 18:19:21,623 INFO: #4 Message duration: 1266836 ns 2025-03-22 18:19:21,623 INFO: #0 Message duration: 183202 ns 2025-03-22 18:19:21,623 INFO: #4 Message duration: 162052 ns 2025-03-22 18:19:21,624 INFO: #4 Message duration: 141692 ns 2025-03-22 18:19:21,624 INFO: #0 Message duration: 339854 ns 2025-03-22 18:19:21,625 INFO: #4 Message 18:21:46 duration: 284623 ns 2025-03-22 18:19:21,628 INFO: #0 Message duration: 403735 ns 2025-03-22 18:19:21,628 INFO: #4 Message duration: 394975 ns 2025-03-22 18:19:21,638 INFO: #4 Message duration: 230543 ns 2025-03-22 18:19:21,700 INFO: #4 Message duration: 304464 ns 2025-03-22 18:20:01,249 INFO: #4 test stopped watching test. 2025-03-22 18:20:01,249 INFO: #0 Message duration: 312953 ns 2025-03-22 18:20:01,250 INFO: #4 Message 18:21:46 duration: 175373 ns 2025-03-22 18:20:01,250 INFO: #4 Message duration: 193633 ns 2025-03-22 18:20:02,233 INFO: #4 Message duration: 263284 ns 2025-03-22 18:20:06,019 INFO: #4 test started watching test (1). 2025-03-22 18:20:06,021 INFO: #4 Message duration: 1551369 ns 2025-03-22 18:20:06,022 INFO: #0 Message duration: 218233 ns 2025-03-22 18:20:06,022 INFO: #4 Message duration: 186292 ns 2025-03-22 18:20:06,022 INFO: #4 18:21:47 Message duration: 156492 ns 2025-03-22 18:20:06,027 INFO: #0 Message duration: 425775 ns 2025-03-22 18:20:06,027 INFO: #4 Message duration: 594377 ns 2025-03-22 18:20:06,032 INFO: #4 Message duration: 380435 ns this is with compression, including both compressing and setting 18:21:53 <04Cgettys> give me a minute and I'll have some numbers for without 18:23:37 <06mumra> i mean ... the fact that it's rendering anything at all on iOS proves that the tech actually works, and everything else is just platform-specific rendering / interface nuances that can be debugged 18:24:14 <04Cgettys> without compression 18:24:14 <04Cgettys> https://cdn.discordapp.com/attachments/747522859361894521/1353177488464937030/message.txt?ex=67e0b43e&is=67df62be&hm=c540777a09cc3a2bdfd375d6f98dbf16f42ee351bcc0371c045d458930c990cb& 18:24:39 <04Cgettys> hard-coded the compression code to false hence it lying 18:24:53 <04Cgettys> there is a header I could set but haven't gotten that far, should probably use the webserver test script 18:24:55 <06mumra> i will try a PWA configuration at some point and see how it works installed as a home screen app (there is an ipad i can get hold of to test) 18:25:19 <04Cgettys> initial impression, it's slower, but not so much slower to necessarily not make sense 18:25:28 <04Cgettys> especially given that this is local where network latency is minimal 18:27:31 <06mumra> you mean, it's faster without compression? 18:28:16 <06mumra> remember the issue is not only the time for wire transfer. the issue is that, especially with multiple spectators, all this time spent in compression adds up to delaying when the next message can start sending to the actual player 18:28:42 <04Cgettys> If your webserver is on the same machine as your browser, yes, compression takes more time than not compressing. That doesn't mean it holds for in the case where you end up having to wait for round trips to the client 18:28:49 <04Cgettys> yes there's bufferinga nd the like but it's not free 18:29:07 <04Cgettys> But alsoi t's not horrenodulsy slower 18:29:21 <06mumra> but the server is not waiting for a message to completely send before it starts sending the next message 18:29:34 <06mumra> however it is waiting for one compression job to complete before it can start the next one 18:29:49 <04Cgettys> Let me phrase this a different way 18:29:59 <06mumra> so looking at the total message time doesn't give you a full picture 18:30:12 <04Cgettys> compression is a noticeable portion of time; I'm about to start playing with the speed/quality knob to see if there's a better piont for us to be at 18:30:17 <04Cgettys> which there almost certainly is 18:30:19 <04Cgettys> But 18:30:31 <04Cgettys> itis not so sizable I believe it to be the bottleneck 18:30:32 <06mumra> anything that's blocking the main thread is not scalable 18:30:50 <06mumra> especially when you consider a webtiles server with multiple players each with multiple spectators 18:31:23 <06mumra> just sending a message over the websocket should be async and not be blocking 18:31:25 <04Cgettys> Sure. Agreed. But my point is that even turnign compression off is not gonna make this scalable, there's too much other CPU-bound work on the thread it looks like 18:31:39 <06mumra> but what else is the CPU doing? 18:31:49 <04Cgettys> Copying bytes around in Python inefficiently? 18:31:53 <04Cgettys> Idk, still digging ;D 18:34:44 <04Cgettys> granted, might be no compression makes sense if it means we actually really totally can and wiill start reusing messages we send 18:34:56 <04Cgettys> but that may require some more substantial rearchitecting 18:35:01 <06mumra> (i would be ultimately tempted to build a new version of the server in nodejs where i know exceedingly well how to do this very scalably just using streams and never having to copy data) 18:35:28 <04Cgettys> I'd be tempted to do it in Rust instead 😄 18:35:53 <04Cgettys> Everythign has its own scalability pitfalls 18:36:39 <06mumra> and it's certainly possible to scale in python, it's just not my forte 18:38:23 <09gammafunk> would this be in TypeScript? 18:38:28 <06mumra> (something i noticed in the browser profiling is unicode conversion taking a surprising amount of time ... and i notice a few calls to do similar in python ... wonder if that's taking up CPU too?) 18:38:43 <09gammafunk> (I know extremely little about javascript land in general, aside from not liking what I see of the syntax) 18:38:49 <04Cgettys> 2025-03-22 18:38:15,924 INFO: #2 test started watching test (1). 2025-03-22 18:38:15,925 INFO: #2 Message duration: 921702 ns 2025-03-22 18:38:15,925 INFO: #1 Message duration: 147752 ns 2025-03-22 18:38:15,925 INFO: #2 Message duration: 126321 ns 2025-03-22 18:38:15,925 INFO: #2 Message duration: 111271 ns 2025-03-22 18:38:15,927 INFO: #1 Message duration: 368015 ns 2025-03-22 18:38:15,927 INFO: #2 Message 18:38:49 duration: 285783 ns 2025-03-22 18:38:15,930 INFO: #1 Message duration: 461846 ns 2025-03-22 18:38:15,931 INFO: #2 Message duration: 329824 ns 2025-03-22 18:38:15,936 INFO: #2 Message duration: 323035 ns 2025-03-22 18:38:16,016 INFO: #2 Message duration: 223673 ns 2025-03-22 18:38:22,904 INFO: #1 Message duration: 252023 ns 2025-03-22 18:38:22,904 INFO: #2 Socket closed. (44.6 kilobytes sent, compression ratio 80.83%) 18:38:52 <04Cgettys> So here's the deal 18:39:10 <04Cgettys> we get the vast, vast majority of the compression benefit at zlib compression level 1 18:39:17 <04Cgettys> from my testign anyway, with the version of zlib I have 18:39:19 <04Cgettys> the default is 6 18:39:28 <06mumra> yeah i use all typescript these days on server or in browser. vanilla javascript scares me now 18:40:01 <04Cgettys> And while I haven't done a super principaled statistical analysis 18:40:15 <04Cgettys> it's pretty clear that compression level = 1 costs hugely less than compression level = 6 18:40:19 <06mumra> makes sense, probably most of the compression is just eliminating the repeated json keys 18:40:36 <04Cgettys> is fairly negliglible over the costs we see eitehr way 18:40:53 <04Cgettys> And I suspect in a real world scenario the 80% compression ratio is worth that overhead 18:41:06 <04Cgettys> So I think step zero here is a 1 liner 18:42:01 <04Cgettys> to change python self._compressobj = zlib.compressobj(zlib.Z_DEFAULT_COMPRESSION, zlib.DEFLATED, -zlib.MAX_WBITS) to python self._compressobj = zlib.compressobj(1, zlib.DEFLATED, -zlib.MAX_WBITS) 18:44:58 <04Cgettys> Ah, ok, I begin to understand maybe 18:45:04 <04Cgettys> Does this mostly effect streamers? 18:45:12 <04Cgettys> LIke is it "one spectator joining" the problem 18:45:18 <04Cgettys> or is it "the 201st when there are already 200" 18:46:30 <09gammafunk> for spectators of a single session? 18:46:49 <04Cgettys> Right (or the 10th, the 20th) 18:46:51 <09gammafunk> no session ever comes remotely near 200, they can get close to 10 during a tournament maybe 18:46:58 <09gammafunk> but even that would be very rare 18:47:11 <04Cgettys> There's a little bit of quadratic behavior here, though whether it's enough to matter I don't know 18:47:12 <09gammafunk> I have seen the odd session with ~30 spectators in extreme cases 18:47:41 <04Cgettys> in that I think if I am seeing this right, all the existing spectators get a little message about the nth spectator 18:47:50 <04Cgettys> e.g. to bump the "n watching" message 18:48:03 <09gammafunk> they definitely would in that the spectator count and spectator list are shown to all spectators 18:48:11 <09gammafunk> so it's not just the count, also the username (assuming logged in) 18:48:48 <09gammafunk> although one can watch anonymously, in which case it's just a number changing from the perspective of other spectators 18:48:52 <04Cgettys> And even with zero compression at all, we're looking at ~0.1ms to send even tiny messages over local network only 18:48:59 <06mumra> i think it was Ge0ff who specifically mentioned this spectator joining issue 18:49:18 <09gammafunk> it's been mentioned by a lot of players over the years, particularly by realtime speedrunners 18:49:19 <04Cgettys> sometimes .2 to .4 ms 18:49:24 <06mumra> and certainly i think whoever it was also mentioned it gets worse with more spectators 18:49:50 <09gammafunk> but you have to be some kind of "celebrity" player to see this (which the realtime runners would quality as) 18:49:57 <04Cgettys> let me step back and understand, how much threading is going on here? 18:50:04 <09gammafunk> there's a noticeable hit from even one spectator joining 18:50:05 <04Cgettys> is the whole darn webserver effectively single-threaded? 18:50:09 <09gammafunk> yes 18:50:15 <09gammafunk> in terms of non-crawl-processes 18:50:19 <09gammafunk> obv those are all forked 18:50:22 <04Cgettys> Right 18:50:36 <06mumra> yeah python is just one main thread 18:50:40 <09gammafunk> the rest is one threat with an asyncio event loop (tornado, but using asyncio for the implementation) 18:50:49 <04Cgettys> async doesn't help with CPU bound work 18:50:51 <06mumra> that's why i'm talking about thread blocking being such a huge problem 18:51:06 <04Cgettys> compression, any byte manipulation in general 18:51:22 <09gammafunk> you can do multithreaded in python obv, but we don't do that, we just use asyncio 18:51:31 <06mumra> it could help with compression because that's a C binding so could be done on a different thread 18:51:41 <04Cgettys> Right, and even there, unless you're running like python 3.13 18:51:53 <04Cgettys> the GIL limits the value of threads in Python 18:52:30 <04Cgettys> Right, C bindings might be ok, if they let go of the lock 18:52:40 <04Cgettys> Though even so, this would get pretty hairy I think 18:52:53 <04Cgettys> it's kinda fine grained to go schedule onto another thread 18:54:13 <06mumra> this is why i'm just really curious to see how it looks on a network situation with compression just disabled, but it sounds like you're seeing other CPU stuff going on anyway? (i'm wondering about the utf8 conversion, as i can't see anything else going on that could possibly be heavy) 18:55:26 <04Cgettys> I need to get it under a proper profiler 18:55:39 <04Cgettys> but yeah, the time maybe drops 2x at best (and that's 6 -> 1) 18:56:01 <06mumra> hmm ... interesting thing you brought up about how there is a separate zlib compression object per socket effectively. so every time a new spectator joins, a new compression stream gets spun up, and immediately has a very large (entire map) object compressed thru it 18:56:08 <04Cgettys> 1 vs off, very negligible 18:56:19 <04Cgettys> Sure, and there is a spikey packet 18:56:24 <04Cgettys> but not a 40ms spikey 18:56:31 <04Cgettys> like maybe we'd see a 1ms spike 18:56:44 <04Cgettys> oh, actually, maybe 18:56:49 <04Cgettys> maybe this is a problem with scheduling 18:57:15 <04Cgettys> maybe an inherent problem, maybe not 18:57:26 <04Cgettys> flush_messages flushes everything for a given stream at once 18:57:37 <04Cgettys> probably to reduce overheads 19:21:06 <04Cgettys> Also my timing may include too much 19:21:12 <04Cgettys> may include waiting 19:22:20 <04Cgettys> by which I mean async io shenanigans 19:25:04 <04Cgettys> Ok, now I'm getting somwewhere 19:25:31 <04Cgettys> 2025-03-22 19:24:53,676 INFO: #0 Message duration: 204312 ns 2025-03-22 19:24:53,677 INFO: #0 stack trace duration: 105541 ns import traceback cur_stack = traceback.format_stack() These two lines are responsible for ~50% of duration 20:20:53 03gammafunk02 07* 0.33-a0-1120-gbd99899e7f: Fix teleport closets in a vault (notthatgreg) 10(7 minutes ago, 1 file, 12+ 11-) 13https://github.com/crawl/crawl/commit/bd99899e7f5b 20:39:16 <04Cgettys> The other fun thing though is that the time to actually print to console is quite significant at this scale 20:39:16 <04Cgettys> and the profiler is being unhelpful 😄 20:55:11 New branch created: pull/4396 (2 commits) 13https://github.com/crawl/crawl/pull/4396 20:55:12 03Cgettys02 07https://github.com/crawl/crawl/pull/4396 * 0.33-a0-1120-g9ef4224c33: Some improvements 10(5 minutes ago, 1 file, 65+ 66-) 13https://github.com/crawl/crawl/commit/9ef4224c33fc 20:55:12 03Cgettys02 07https://github.com/crawl/crawl/pull/4396 * 0.33-a0-1121-geada247ac2: comment out troubleshooting code 10(2 minutes ago, 1 file, 2+ 3-) 13https://github.com/crawl/crawl/commit/eada247ac270 21:02:43 <09gammafunk> Cgettys: if x is not None should just be replaced with if x, while we're at it, unless there's actual concern about differentiation between e.g. None/False/0 21:03:16 <09gammafunk> I assume whoever wrote the old conditionals was just a bit unsure of python's truth evaluation 21:08:12 <04Cgettys> Well apparently I was also unsure of it 😄 21:12:12 03Cgettys02 07https://github.com/crawl/crawl/pull/4396 * 0.33-a0-1122-g937857dd4f: Is not None 10(2 minutes ago, 1 file, 8+ 7-) 13https://github.com/crawl/crawl/commit/937857dd4fa5 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1120-g83cda15d20: feat: New Unrand, Bootknife 10(8 days ago, 5 files, 38+ 2-) 13https://github.com/crawl/crawl/commit/83cda15d20cf 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1121-g2c3601f5a5: weaken effect 10(7 days ago, 1 file, 5+ 2-) 13https://github.com/crawl/crawl/commit/2c3601f5a591 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1122-gcf1090533f: 5 + 2 * shortblades out of 100 paralyse 10(7 days ago, 1 file, 3+ 2-) 13https://github.com/crawl/crawl/commit/cf1090533f1a 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1123-g52edf35ccc: xl-independent (TBD) 10(7 days ago, 1 file, 1+ 1-) 13https://github.com/crawl/crawl/commit/52edf35ccc17 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1124-ge8cd7405dc: Flavour text 10(7 days ago, 2 files, 11+ 4-) 13https://github.com/crawl/crawl/commit/e8cd7405dcce 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1125-g5f4372ee1a: spacing 10(7 days ago, 1 file, 1+ 1-) 13https://github.com/crawl/crawl/commit/5f4372ee1a7e 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1126-g1c40d18ec7: dancing -> dance 10(7 days ago, 2 files, 2+ 2-) 13https://github.com/crawl/crawl/commit/1c40d18ec77a 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1127-g3998815b99: lower to 5 + SK_SHORT_BLADES 10(22 hours ago, 1 file, 1+ 1-) 13https://github.com/crawl/crawl/commit/3998815b99d9 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1128-g615bd862b2: lower to 6 + SK_SHORT_BLADES, and 5 damage for aux 10(22 hours ago, 1 file, 2+ 2-) 13https://github.com/crawl/crawl/commit/615bd862b223 21:13:18 03Cgettys02 07https://github.com/crawl/crawl/pull/4378 * 0.33-a0-1129-g7f8db8fcaa: nerf the damage some more 10(6 hours ago, 1 file, 1+ 1-) 13https://github.com/crawl/crawl/commit/7f8db8fcaacb 21:13:57 <04Cgettys> sorry, rebased while I was updating branches 21:14:46 <04Cgettys> Fixed now 21:14:59 <04Cgettys> Not 100% confident in everythign from that branch 21:15:02 <04Cgettys> may need to trickle in 21:15:11 <04Cgettys> Also not convinced it's the bottleneck 21:15:56 <04Cgettys> The worst I've seen is like, 1.3ms to send a message 21:16:21 <04Cgettys> which to be clear, is dreadful IMO 21:16:25 <04Cgettys> but still 21:16:39 <04Cgettys> The trouble is, most of the time at this pont I think is spent in Tornado 21:17:09 <04Cgettys> e.g. a large chunk in write_message 21:21:07 <04Cgettys> Also looks like tornado is entirely capable of doing the compression for us these days 21:50:13 <04Cgettys> But probably won't improve perf even if we went that way 23:13:21 Unstable branch on cbro.berotato.org updated to: 0.33-a0-1120-gbd99899e7f (34) 23:35:20 Unstable branch on crawl.develz.org updated to: 0.33-a0-1120-gbd99899e7f (34) 23:50:44 <06mumra> I mean if these numbers are right, it's not explaining the issues we're seeing (but, you're not looking at what a real server is doing under load with multiple games and connections running ... you're looking at your computer with a single game and single connection ... ) 23:53:11 <06mumra> The thing with thread blocking is, every game process that's running is going to be waiting for that python thread to unblock. The player's keypress can't be sent to their game process until the thread unblocks. etc etc 23:58:23 Windows builds of master branch on crawl.develz.org updated to: 0.33-a0-1120-gbd99899e7f