Fix workqueue thread support
I was accidentally clobbering the `stack_addr` arg which is actually the `flags` arg for workqueues.
Also, use `stack_addr` from `args` instead of `rdi` (which is the dthread pointer) because the user could have allocated a custom stack and libpthread passes that in.
Finally, workqueues have a separate flag to indicate that the TSD base has been set.