Linux kernel mirror (for testing) git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
kernel os linux
1
fork

Configure Feed

Select the types of activity you want to include in your feed.

Merge tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull timekeeping and VDSO updates from Thomas Gleixner:

- Introduce support for auxiliary timekeepers

PTP clocks can be disconnected from the universal CLOCK_TAI reality
for various reasons including regularatory requirements for
functional safety redundancy.

The kernel so far only supports a single notion of time, which means
that all clocks are correlated in frequency and only differ by offset
to each other.

Access to non-correlated PTP clocks has been available so far only
through the file descriptor based "POSIX clock IDs", which are
subject to locking and have to go all the way out to the hardware.

The access is not only horribly slow, as it has to go all the way out
to the NIC/PTP hardware, but that also prevents the kernel to read
the time of such clocks e.g. from the network stack, where it is
required for TSN networking both on the transmit and receive side
unless the hardware provides offloading.

The auxiliary clocks provide a mechanism to support arbitrary clocks
which are not correlated to the system clock. This is not restricted
to the PTP use case on purpose as there is no kernel side association
of these clocks to a particular PTP device because that's a pure user
space configuration decision. Having them independent allows to
utilize them for other purposes and also enables them to be tested
without hardware dependencies.

To avoid pointless overhead these clocks have to be enabled
individualy via a new sysfs interface to reduce the overhead to a
single compare in the hotpath if they are enabled at the Kconfig
level at all.

These clocks utilize the existing timekeeping/NTP infrastructures,
which has been made possible over the recent releases by incrementaly
converting these infrastructures over from a single static instance
to a multi-instance pointer based implementation without any
performance regression reported.

The auxiliary clocks provide the same "emulation" of a "correct"
clock as the existing CLOCK_* variants do with an independent
instance of data and provide the same steering mechanism through the
existing sys_clock_adjtime() interface, which has been confirmed to
work by the chronyd(8) maintainer.

That allows to provide lockless kernel internal and VDSO support so
that applications and kernel internal functionalities can access
these clocks without restrictions and at the same performance as the
existing system clocks.

- Avoid double notifications in the adjtimex() syscall. Not a big
issue, but a trivial to avoid latency source.

* tag 'timers-ptp-2025-07-27' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (39 commits)
vdso/gettimeofday: Add support for auxiliary clocks
vdso/vsyscall: Update auxiliary clock data in the datapage
vdso: Introduce aux_clock_resolution_ns()
vdso/gettimeofday: Introduce vdso_get_timestamp()
vdso/gettimeofday: Introduce vdso_set_timespec()
vdso/gettimeofday: Introduce vdso_clockid_valid()
vdso/gettimeofday: Return bool from clock_gettime() helpers
vdso/gettimeofday: Return bool from clock_getres() helpers
vdso/helpers: Add helpers for seqlocks of single vdso_clock
vdso/vsyscall: Split up __arch_update_vsyscall() into __arch_update_vdso_clock()
vdso/vsyscall: Introduce a helper to fill clock configurations
timekeeping: Remove the temporary CLOCK_AUX workaround
timekeeping: Provide ktime_get_clock_ts64()
timekeeping: Provide interface to control auxiliary clocks
timekeeping: Provide update for auxiliary timekeepers
timekeeping: Provide adjtimex() for auxiliary clocks
timekeeping: Prepare do_adtimex() for auxiliary clocks
timekeeping: Make do_adjtimex() reusable
timekeeping: Add auxiliary clock support to __timekeeping_inject_offset()
timekeeping: Make timekeeping_inject_offset() reusable
...

+963 -249
+5
Documentation/ABI/stable/sysfs-kernel-time-aux-clocks
··· 1 + What: /sys/kernel/time/aux_clocks/<ID>/enable 2 + Date: May 2025 3 + Contact: Thomas Gleixner <tglx@linutronix.de> 4 + Description: 5 + Controls the enablement of auxiliary clock timekeepers.
+3 -4
arch/arm64/include/asm/vdso/vsyscall.h
··· 13 13 * Update the vDSO data page to keep in sync with kernel timekeeping. 14 14 */ 15 15 static __always_inline 16 - void __arm64_update_vsyscall(struct vdso_time_data *vdata) 16 + void __arch_update_vdso_clock(struct vdso_clock *vc) 17 17 { 18 - vdata->clock_data[CS_HRES_COARSE].mask = VDSO_PRECISION_MASK; 19 - vdata->clock_data[CS_RAW].mask = VDSO_PRECISION_MASK; 18 + vc->mask = VDSO_PRECISION_MASK; 20 19 } 21 - #define __arch_update_vsyscall __arm64_update_vsyscall 20 + #define __arch_update_vdso_clock __arch_update_vdso_clock 22 21 23 22 /* The asm-generic header needs to be included after the definitions above */ 24 23 #include <asm-generic/vdso/vsyscall.h>
+3 -3
include/asm-generic/vdso/vsyscall.h
··· 22 22 23 23 #endif /* CONFIG_GENERIC_VDSO_DATA_STORE */ 24 24 25 - #ifndef __arch_update_vsyscall 26 - static __always_inline void __arch_update_vsyscall(struct vdso_time_data *vdata) 25 + #ifndef __arch_update_vdso_clock 26 + static __always_inline void __arch_update_vdso_clock(struct vdso_clock *vc) 27 27 { 28 28 } 29 - #endif /* __arch_update_vsyscall */ 29 + #endif /* __arch_update_vdso_clock */ 30 30 31 31 #ifndef __arch_sync_vdso_time_data 32 32 static __always_inline void __arch_sync_vdso_time_data(struct vdso_time_data *vdata)
+5
include/linux/posix-timers.h
··· 37 37 return ~(clk >> 3); 38 38 } 39 39 40 + static inline bool clockid_aux_valid(clockid_t id) 41 + { 42 + return IS_ENABLED(CONFIG_POSIX_AUX_CLOCKS) && id >= CLOCK_AUX && id <= CLOCK_AUX_LAST; 43 + } 44 + 40 45 #ifdef CONFIG_POSIX_TIMERS 41 46 42 47 #include <linux/signal_types.h>
+35 -2
include/linux/timekeeper_internal.h
··· 12 12 #include <linux/time.h> 13 13 14 14 /** 15 + * timekeeper_ids - IDs for various time keepers in the kernel 16 + * @TIMEKEEPER_CORE: The central core timekeeper managing system time 17 + * @TIMEKEEPER_AUX_FIRST: The first AUX timekeeper 18 + * @TIMEKEEPER_AUX_LAST: The last AUX timekeeper 19 + * @TIMEKEEPERS_MAX: The maximum number of timekeepers managed 20 + */ 21 + enum timekeeper_ids { 22 + TIMEKEEPER_CORE, 23 + #ifdef CONFIG_POSIX_AUX_CLOCKS 24 + TIMEKEEPER_AUX_FIRST, 25 + TIMEKEEPER_AUX_LAST = TIMEKEEPER_AUX_FIRST + MAX_AUX_CLOCKS - 1, 26 + #endif 27 + TIMEKEEPERS_MAX, 28 + }; 29 + 30 + /** 15 31 * struct tk_read_base - base structure for timekeeping readout 16 32 * @clock: Current clocksource used for timekeeping. 17 33 * @mask: Bitmask for two's complement subtraction of non 64bit clocks ··· 67 51 * @offs_real: Offset clock monotonic -> clock realtime 68 52 * @offs_boot: Offset clock monotonic -> clock boottime 69 53 * @offs_tai: Offset clock monotonic -> clock tai 54 + * @offs_aux: Offset clock monotonic -> clock AUX 70 55 * @coarse_nsec: The nanoseconds part for coarse time getters 56 + * @id: The timekeeper ID 71 57 * @tkr_raw: The readout base structure for CLOCK_MONOTONIC_RAW 72 58 * @raw_sec: CLOCK_MONOTONIC_RAW time in seconds 73 59 * @clock_was_set_seq: The sequence number of clock was set events 74 60 * @cs_was_changed_seq: The sequence number of clocksource change events 61 + * @clock_valid: Indicator for valid clock 75 62 * @monotonic_to_boot: CLOCK_MONOTONIC to CLOCK_BOOTTIME offset 76 63 * @cycle_interval: Number of clock cycles in one NTP interval 77 64 * @xtime_interval: Number of clock shifted nano seconds in one NTP ··· 114 95 * @monotonic_to_boottime is a timespec64 representation of @offs_boot to 115 96 * accelerate the VDSO update for CLOCK_BOOTTIME. 116 97 * 98 + * @offs_aux is used by the auxiliary timekeepers which do not utilize any 99 + * of the regular timekeeper offset fields. 100 + * 117 101 * The cacheline ordering of the structure is optimized for in kernel usage of 118 102 * the ktime_get() and ktime_get_ts64() family of time accessors. Struct 119 103 * timekeeper is prepended in the core timekeeping code with a sequence count, 120 104 * which results in the following cacheline layout: 121 105 * 122 106 * 0: seqcount, tkr_mono 123 - * 1: xtime_sec ... coarse_nsec 107 + * 1: xtime_sec ... id 124 108 * 2: tkr_raw, raw_sec 125 109 * 3,4: Internal variables 126 110 * ··· 143 121 struct timespec64 wall_to_monotonic; 144 122 ktime_t offs_real; 145 123 ktime_t offs_boot; 146 - ktime_t offs_tai; 124 + union { 125 + ktime_t offs_tai; 126 + ktime_t offs_aux; 127 + }; 147 128 u32 coarse_nsec; 129 + enum timekeeper_ids id; 148 130 149 131 /* Cacheline 2: */ 150 132 struct tk_read_base tkr_raw; ··· 157 131 /* Cachline 3 and 4 (timekeeping internal variables): */ 158 132 unsigned int clock_was_set_seq; 159 133 u8 cs_was_changed_seq; 134 + u8 clock_valid; 160 135 161 136 struct timespec64 monotonic_to_boot; 162 137 ··· 188 161 static inline void update_vsyscall_tz(void) 189 162 { 190 163 } 164 + #endif 165 + 166 + #if defined(CONFIG_GENERIC_GETTIMEOFDAY) && defined(CONFIG_POSIX_AUX_CLOCKS) 167 + extern void vdso_time_update_aux(struct timekeeper *tk); 168 + #else 169 + static inline void vdso_time_update_aux(struct timekeeper *tk) { } 191 170 #endif 192 171 193 172 #endif /* _LINUX_TIMEKEEPER_INTERNAL_H */
+12
include/linux/timekeeping.h
··· 44 44 extern void ktime_get_real_ts64(struct timespec64 *tv); 45 45 extern void ktime_get_coarse_ts64(struct timespec64 *ts); 46 46 extern void ktime_get_coarse_real_ts64(struct timespec64 *ts); 47 + extern void ktime_get_clock_ts64(clockid_t id, struct timespec64 *ts); 47 48 48 49 /* Multigrain timestamp interfaces */ 49 50 extern void ktime_get_coarse_real_ts64_mg(struct timespec64 *ts); ··· 263 262 extern bool timekeeping_rtc_skipresume(void); 264 263 265 264 extern void timekeeping_inject_sleeptime64(const struct timespec64 *delta); 265 + 266 + /* 267 + * Auxiliary clock interfaces 268 + */ 269 + #ifdef CONFIG_POSIX_AUX_CLOCKS 270 + extern bool ktime_get_aux(clockid_t id, ktime_t *kt); 271 + extern bool ktime_get_aux_ts64(clockid_t id, struct timespec64 *kt); 272 + #else 273 + static inline bool ktime_get_aux(clockid_t id, ktime_t *kt) { return false; } 274 + static inline bool ktime_get_aux_ts64(clockid_t id, struct timespec64 *kt) { return false; } 275 + #endif 266 276 267 277 /** 268 278 * struct system_time_snapshot - simultaneous raw/real time capture with
+11
include/uapi/linux/time.h
··· 64 64 #define CLOCK_TAI 11 65 65 66 66 #define MAX_CLOCKS 16 67 + 68 + /* 69 + * AUX clock support. AUXiliary clocks are dynamically configured by 70 + * enabling a clock ID. These clock can be steered independently of the 71 + * core timekeeper. The kernel can support up to 8 auxiliary clocks, but 72 + * the actual limit depends on eventual architecture constraints vs. VDSO. 73 + */ 74 + #define CLOCK_AUX MAX_CLOCKS 75 + #define MAX_AUX_CLOCKS 8 76 + #define CLOCK_AUX_LAST (CLOCK_AUX + MAX_AUX_CLOCKS - 1) 77 + 67 78 #define CLOCKS_MASK (CLOCK_REALTIME | CLOCK_MONOTONIC) 68 79 #define CLOCKS_MONO CLOCK_MONOTONIC 69 80
+13
include/vdso/auxclock.h
··· 1 + /* SPDX-License-Identifier: GPL-2.0 */ 2 + #ifndef _VDSO_AUXCLOCK_H 3 + #define _VDSO_AUXCLOCK_H 4 + 5 + #include <uapi/linux/time.h> 6 + #include <uapi/linux/types.h> 7 + 8 + static __always_inline u64 aux_clock_resolution_ns(void) 9 + { 10 + return 1; 11 + } 12 + 13 + #endif /* _VDSO_AUXCLOCK_H */
+5
include/vdso/datapage.h
··· 5 5 #ifndef __ASSEMBLY__ 6 6 7 7 #include <linux/compiler.h> 8 + #include <uapi/linux/bits.h> 8 9 #include <uapi/linux/time.h> 9 10 #include <uapi/linux/types.h> 10 11 #include <uapi/asm-generic/errno-base.h> ··· 39 38 #endif 40 39 41 40 #define VDSO_BASES (CLOCK_TAI + 1) 41 + #define VDSO_BASE_AUX 0 42 42 #define VDSO_HRES (BIT(CLOCK_REALTIME) | \ 43 43 BIT(CLOCK_MONOTONIC) | \ 44 44 BIT(CLOCK_BOOTTIME) | \ ··· 47 45 #define VDSO_COARSE (BIT(CLOCK_REALTIME_COARSE) | \ 48 46 BIT(CLOCK_MONOTONIC_COARSE)) 49 47 #define VDSO_RAW (BIT(CLOCK_MONOTONIC_RAW)) 48 + #define VDSO_AUX __GENMASK(CLOCK_AUX_LAST, CLOCK_AUX) 50 49 51 50 #define CS_HRES_COARSE 0 52 51 #define CS_RAW 1 ··· 120 117 * @arch_data: architecture specific data (optional, defaults 121 118 * to an empty struct) 122 119 * @clock_data: clocksource related data (array) 120 + * @aux_clock_data: auxiliary clocksource related data (array) 123 121 * @tz_minuteswest: minutes west of Greenwich 124 122 * @tz_dsttime: type of DST correction 125 123 * @hrtimer_res: hrtimer resolution ··· 137 133 struct arch_vdso_time_data arch_data; 138 134 139 135 struct vdso_clock clock_data[CS_BASES]; 136 + struct vdso_clock aux_clock_data[MAX_AUX_CLOCKS]; 140 137 141 138 s32 tz_minuteswest; 142 139 s32 tz_dsttime;
+40 -14
include/vdso/helpers.h
··· 28 28 return seq != start; 29 29 } 30 30 31 + static __always_inline void vdso_write_seq_begin(struct vdso_clock *vc) 32 + { 33 + /* 34 + * WRITE_ONCE() is required otherwise the compiler can validly tear 35 + * updates to vc->seq and it is possible that the value seen by the 36 + * reader is inconsistent. 37 + */ 38 + WRITE_ONCE(vc->seq, vc->seq + 1); 39 + } 40 + 41 + static __always_inline void vdso_write_seq_end(struct vdso_clock *vc) 42 + { 43 + /* 44 + * WRITE_ONCE() is required otherwise the compiler can validly tear 45 + * updates to vc->seq and it is possible that the value seen by the 46 + * reader is inconsistent. 47 + */ 48 + WRITE_ONCE(vc->seq, vc->seq + 1); 49 + } 50 + 51 + static __always_inline void vdso_write_begin_clock(struct vdso_clock *vc) 52 + { 53 + vdso_write_seq_begin(vc); 54 + /* Ensure the sequence invalidation is visible before data is modified */ 55 + smp_wmb(); 56 + } 57 + 58 + static __always_inline void vdso_write_end_clock(struct vdso_clock *vc) 59 + { 60 + /* Ensure the data update is visible before the sequence is set valid again */ 61 + smp_wmb(); 62 + vdso_write_seq_end(vc); 63 + } 64 + 31 65 static __always_inline void vdso_write_begin(struct vdso_time_data *vd) 32 66 { 33 67 struct vdso_clock *vc = vd->clock_data; 34 68 35 - /* 36 - * WRITE_ONCE() is required otherwise the compiler can validly tear 37 - * updates to vd[x].seq and it is possible that the value seen by the 38 - * reader is inconsistent. 39 - */ 40 - WRITE_ONCE(vc[CS_HRES_COARSE].seq, vc[CS_HRES_COARSE].seq + 1); 41 - WRITE_ONCE(vc[CS_RAW].seq, vc[CS_RAW].seq + 1); 69 + vdso_write_seq_begin(&vc[CS_HRES_COARSE]); 70 + vdso_write_seq_begin(&vc[CS_RAW]); 71 + /* Ensure the sequence invalidation is visible before data is modified */ 42 72 smp_wmb(); 43 73 } 44 74 ··· 76 46 { 77 47 struct vdso_clock *vc = vd->clock_data; 78 48 49 + /* Ensure the data update is visible before the sequence is set valid again */ 79 50 smp_wmb(); 80 - /* 81 - * WRITE_ONCE() is required otherwise the compiler can validly tear 82 - * updates to vd[x].seq and it is possible that the value seen by the 83 - * reader is inconsistent. 84 - */ 85 - WRITE_ONCE(vc[CS_HRES_COARSE].seq, vc[CS_HRES_COARSE].seq + 1); 86 - WRITE_ONCE(vc[CS_RAW].seq, vc[CS_RAW].seq + 1); 51 + vdso_write_seq_end(&vc[CS_HRES_COARSE]); 52 + vdso_write_seq_end(&vc[CS_RAW]); 87 53 } 88 54 89 55 #endif /* !__ASSEMBLY__ */
+13 -2
kernel/time/Kconfig
··· 82 82 help 83 83 Tracks idle state on behalf of RCU. 84 84 85 - if GENERIC_CLOCKEVENTS 86 85 menu "Timers subsystem" 87 86 87 + if GENERIC_CLOCKEVENTS 88 88 # Core internal switch. Selected by NO_HZ_COMMON / HIGH_RES_TIMERS. This is 89 89 # only related to the tick functionality. Oneshot clockevent devices 90 90 # are supported independent of this. ··· 208 208 interval and NTP's maximum frequency drift of 500 parts 209 209 per million. If the clocksource is good enough for NTP, 210 210 it is good enough for the clocksource watchdog! 211 + endif 212 + 213 + config POSIX_AUX_CLOCKS 214 + bool "Enable auxiliary POSIX clocks" 215 + depends on POSIX_TIMERS 216 + help 217 + Auxiliary POSIX clocks are clocks which can be steered 218 + independently of the core timekeeper, which controls the 219 + MONOTONIC, REALTIME, BOOTTIME and TAI clocks. They are useful to 220 + provide e.g. lockless time accessors to independent PTP clocks 221 + and other clock domains, which are not correlated to the TAI/NTP 222 + notion of time. 211 223 212 224 endmenu 213 - endif
+5
kernel/time/namespace.c
··· 242 242 for (i = 0; i < CS_BASES; i++) 243 243 timens_setup_vdso_clock_data(&vc[i], ns); 244 244 245 + if (IS_ENABLED(CONFIG_POSIX_AUX_CLOCKS)) { 246 + for (i = 0; i < ARRAY_SIZE(vdata->aux_clock_data); i++) 247 + timens_setup_vdso_clock_data(&vdata->aux_clock_data[i], ns); 248 + } 249 + 245 250 out: 246 251 mutex_unlock(&offset_lock); 247 252 }
+40 -32
kernel/time/ntp.c
··· 18 18 #include <linux/module.h> 19 19 #include <linux/rtc.h> 20 20 #include <linux/audit.h> 21 + #include <linux/timekeeper_internal.h> 21 22 22 23 #include "ntp_internal.h" 23 24 #include "timekeeping_internal.h" ··· 87 86 #endif 88 87 }; 89 88 90 - static struct ntp_data tk_ntp_data = { 91 - .tick_usec = USER_TICK_USEC, 92 - .time_state = TIME_OK, 93 - .time_status = STA_UNSYNC, 94 - .time_constant = 2, 95 - .time_maxerror = NTP_PHASE_LIMIT, 96 - .time_esterror = NTP_PHASE_LIMIT, 97 - .ntp_next_leap_sec = TIME64_MAX, 89 + static struct ntp_data tk_ntp_data[TIMEKEEPERS_MAX] = { 90 + [ 0 ... TIMEKEEPERS_MAX - 1 ] = { 91 + .tick_usec = USER_TICK_USEC, 92 + .time_state = TIME_OK, 93 + .time_status = STA_UNSYNC, 94 + .time_constant = 2, 95 + .time_maxerror = NTP_PHASE_LIMIT, 96 + .time_esterror = NTP_PHASE_LIMIT, 97 + .ntp_next_leap_sec = TIME64_MAX, 98 + }, 98 99 }; 99 100 100 101 #define SECS_PER_DAY 86400 ··· 303 300 * Select how the frequency is to be controlled 304 301 * and in which mode (PLL or FLL). 305 302 */ 306 - real_secs = __ktime_get_real_seconds(); 303 + real_secs = ktime_get_ntp_seconds(ntpdata - tk_ntp_data); 307 304 secs = (long)(real_secs - ntpdata->time_reftime); 308 305 if (unlikely(ntpdata->time_status & STA_FREQHOLD)) 309 306 secs = 0; ··· 351 348 352 349 /** 353 350 * ntp_clear - Clears the NTP state variables 351 + * @tkid: Timekeeper ID to be able to select proper ntp data array member 354 352 */ 355 - void ntp_clear(void) 353 + void ntp_clear(unsigned int tkid) 356 354 { 357 - __ntp_clear(&tk_ntp_data); 355 + __ntp_clear(&tk_ntp_data[tkid]); 358 356 } 359 357 360 358 361 - u64 ntp_tick_length(void) 359 + u64 ntp_tick_length(unsigned int tkid) 362 360 { 363 - return tk_ntp_data.tick_length; 361 + return tk_ntp_data[tkid].tick_length; 364 362 } 365 363 366 364 /** 367 365 * ntp_get_next_leap - Returns the next leapsecond in CLOCK_REALTIME ktime_t 366 + * @tkid: Timekeeper ID 368 367 * 369 - * Provides the time of the next leapsecond against CLOCK_REALTIME in 370 - * a ktime_t format. Returns KTIME_MAX if no leapsecond is pending. 368 + * Returns: For @tkid == TIMEKEEPER_CORE this provides the time of the next 369 + * leap second against CLOCK_REALTIME in a ktime_t format if a 370 + * leap second is pending. KTIME_MAX otherwise. 371 371 */ 372 - ktime_t ntp_get_next_leap(void) 372 + ktime_t ntp_get_next_leap(unsigned int tkid) 373 373 { 374 - struct ntp_data *ntpdata = &tk_ntp_data; 375 - ktime_t ret; 374 + struct ntp_data *ntpdata = &tk_ntp_data[TIMEKEEPER_CORE]; 375 + 376 + if (tkid != TIMEKEEPER_CORE) 377 + return KTIME_MAX; 376 378 377 379 if ((ntpdata->time_state == TIME_INS) && (ntpdata->time_status & STA_INS)) 378 380 return ktime_set(ntpdata->ntp_next_leap_sec, 0); 379 - ret = KTIME_MAX; 380 - return ret; 381 + 382 + return KTIME_MAX; 381 383 } 382 384 383 385 /* ··· 395 387 * 396 388 * Also handles leap second processing, and returns leap offset 397 389 */ 398 - int second_overflow(time64_t secs) 390 + int second_overflow(unsigned int tkid, time64_t secs) 399 391 { 400 - struct ntp_data *ntpdata = &tk_ntp_data; 392 + struct ntp_data *ntpdata = &tk_ntp_data[tkid]; 401 393 s64 delta; 402 394 int leap = 0; 403 395 s32 rem; ··· 613 605 */ 614 606 static inline bool ntp_synced(void) 615 607 { 616 - return !(tk_ntp_data.time_status & STA_UNSYNC); 608 + return !(tk_ntp_data[TIMEKEEPER_CORE].time_status & STA_UNSYNC); 617 609 } 618 610 619 611 /* ··· 710 702 * reference time to current time. 711 703 */ 712 704 if (!(ntpdata->time_status & STA_PLL) && (txc->status & STA_PLL)) 713 - ntpdata->time_reftime = __ktime_get_real_seconds(); 705 + ntpdata->time_reftime = ktime_get_ntp_seconds(ntpdata - tk_ntp_data); 714 706 715 707 /* only set allowed bits */ 716 708 ntpdata->time_status &= STA_RONLY; ··· 767 759 * adjtimex() mainly allows reading (and writing, if superuser) of 768 760 * kernel time-keeping variables. used by xntpd. 769 761 */ 770 - int __do_adjtimex(struct __kernel_timex *txc, const struct timespec64 *ts, 771 - s32 *time_tai, struct audit_ntp_data *ad) 762 + int ntp_adjtimex(unsigned int tkid, struct __kernel_timex *txc, const struct timespec64 *ts, 763 + s32 *time_tai, struct audit_ntp_data *ad) 772 764 { 773 - struct ntp_data *ntpdata = &tk_ntp_data; 765 + struct ntp_data *ntpdata = &tk_ntp_data[tkid]; 774 766 int result; 775 767 776 768 if (txc->modes & ADJ_ADJTIME) { ··· 1039 1031 */ 1040 1032 void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts) 1041 1033 { 1034 + struct ntp_data *ntpdata = &tk_ntp_data[TIMEKEEPER_CORE]; 1042 1035 struct pps_normtime pts_norm, freq_norm; 1043 - struct ntp_data *ntpdata = &tk_ntp_data; 1044 1036 1045 1037 pts_norm = pps_normalize_ts(*phase_ts); 1046 1038 ··· 1091 1083 1092 1084 static int __init ntp_tick_adj_setup(char *str) 1093 1085 { 1094 - int rc = kstrtos64(str, 0, &tk_ntp_data.ntp_tick_adj); 1086 + int rc = kstrtos64(str, 0, &tk_ntp_data[TIMEKEEPER_CORE].ntp_tick_adj); 1095 1087 if (rc) 1096 1088 return rc; 1097 1089 1098 - tk_ntp_data.ntp_tick_adj <<= NTP_SCALE_SHIFT; 1090 + tk_ntp_data[TIMEKEEPER_CORE].ntp_tick_adj <<= NTP_SCALE_SHIFT; 1099 1091 return 1; 1100 1092 } 1101 - 1102 1093 __setup("ntp_tick_adj=", ntp_tick_adj_setup); 1103 1094 1104 1095 void __init ntp_init(void) 1105 1096 { 1106 - ntp_clear(); 1097 + for (int id = 0; id < TIMEKEEPERS_MAX; id++) 1098 + __ntp_clear(tk_ntp_data + id); 1107 1099 ntp_init_cmos_sync(); 1108 1100 }
+6 -7
kernel/time/ntp_internal.h
··· 3 3 #define _LINUX_NTP_INTERNAL_H 4 4 5 5 extern void ntp_init(void); 6 - extern void ntp_clear(void); 6 + extern void ntp_clear(unsigned int tkid); 7 7 /* Returns how long ticks are at present, in ns / 2^NTP_SCALE_SHIFT. */ 8 - extern u64 ntp_tick_length(void); 9 - extern ktime_t ntp_get_next_leap(void); 10 - extern int second_overflow(time64_t secs); 11 - extern int __do_adjtimex(struct __kernel_timex *txc, 12 - const struct timespec64 *ts, 13 - s32 *time_tai, struct audit_ntp_data *ad); 8 + extern u64 ntp_tick_length(unsigned int tkid); 9 + extern ktime_t ntp_get_next_leap(unsigned int tkid); 10 + extern int second_overflow(unsigned int tkid, time64_t secs); 11 + extern int ntp_adjtimex(unsigned int tkid, struct __kernel_timex *txc, const struct timespec64 *ts, 12 + s32 *time_tai, struct audit_ntp_data *ad); 14 13 extern void __hardpps(const struct timespec64 *phase_ts, const struct timespec64 *raw_ts); 15 14 16 15 #if defined(CONFIG_GENERIC_CMOS_UPDATE) || defined(CONFIG_RTC_SYSTOHC)
+3
kernel/time/posix-timers.c
··· 1526 1526 [CLOCK_REALTIME_ALARM] = &alarm_clock, 1527 1527 [CLOCK_BOOTTIME_ALARM] = &alarm_clock, 1528 1528 [CLOCK_TAI] = &clock_tai, 1529 + #ifdef CONFIG_POSIX_AUX_CLOCKS 1530 + [CLOCK_AUX ... CLOCK_AUX_LAST] = &clock_aux, 1531 + #endif 1529 1532 }; 1530 1533 1531 1534 static const struct k_clock *clockid_to_kclock(const clockid_t id)
+1
kernel/time/posix-timers.h
··· 41 41 extern const struct k_clock clock_process; 42 42 extern const struct k_clock clock_thread; 43 43 extern const struct k_clock alarm_clock; 44 + extern const struct k_clock clock_aux; 44 45 45 46 void posix_timer_queue_signal(struct k_itimer *timr); 46 47
+565 -86
kernel/time/timekeeping.c
··· 6 6 #include <linux/timekeeper_internal.h> 7 7 #include <linux/module.h> 8 8 #include <linux/interrupt.h> 9 + #include <linux/kobject.h> 9 10 #include <linux/percpu.h> 10 11 #include <linux/init.h> 11 12 #include <linux/mm.h> ··· 25 24 #include <linux/compiler.h> 26 25 #include <linux/audit.h> 27 26 #include <linux/random.h> 27 + 28 + #include <vdso/auxclock.h> 28 29 29 30 #include "tick-internal.h" 30 31 #include "ntp_internal.h" ··· 56 53 raw_spinlock_t lock; 57 54 } ____cacheline_aligned; 58 55 59 - static struct tk_data tk_core; 56 + static struct tk_data timekeeper_data[TIMEKEEPERS_MAX]; 57 + 58 + /* The core timekeeper */ 59 + #define tk_core (timekeeper_data[TIMEKEEPER_CORE]) 60 + 61 + #ifdef CONFIG_POSIX_AUX_CLOCKS 62 + static inline bool tk_get_aux_ts64(unsigned int tkid, struct timespec64 *ts) 63 + { 64 + return ktime_get_aux_ts64(CLOCK_AUX + tkid - TIMEKEEPER_AUX_FIRST, ts); 65 + } 66 + 67 + static inline bool tk_is_aux(const struct timekeeper *tk) 68 + { 69 + return tk->id >= TIMEKEEPER_AUX_FIRST && tk->id <= TIMEKEEPER_AUX_LAST; 70 + } 71 + #else 72 + static inline bool tk_get_aux_ts64(unsigned int tkid, struct timespec64 *ts) 73 + { 74 + return false; 75 + } 76 + 77 + static inline bool tk_is_aux(const struct timekeeper *tk) 78 + { 79 + return false; 80 + } 81 + #endif 60 82 61 83 /* flag for if timekeeping is suspended */ 62 84 int __read_mostly timekeeping_suspended; ··· 140 112 .base[0] = FAST_TK_INIT, 141 113 .base[1] = FAST_TK_INIT, 142 114 }; 115 + 116 + #ifdef CONFIG_POSIX_AUX_CLOCKS 117 + static __init void tk_aux_setup(void); 118 + static void tk_aux_update_clocksource(void); 119 + static void tk_aux_advance(void); 120 + #else 121 + static inline void tk_aux_setup(void) { } 122 + static inline void tk_aux_update_clocksource(void) { } 123 + static inline void tk_aux_advance(void) { } 124 + #endif 143 125 144 126 unsigned long timekeeper_lock_irqsave(void) 145 127 { ··· 639 601 */ 640 602 static inline void tk_update_leap_state(struct timekeeper *tk) 641 603 { 642 - tk->next_leap_ktime = ntp_get_next_leap(); 604 + tk->next_leap_ktime = ntp_get_next_leap(tk->id); 643 605 if (tk->next_leap_ktime != KTIME_MAX) 644 606 /* Convert to monotonic time */ 645 607 tk->next_leap_ktime = ktime_sub(tk->next_leap_ktime, tk->offs_real); ··· 701 663 702 664 static void timekeeping_update_from_shadow(struct tk_data *tkd, unsigned int action) 703 665 { 704 - struct timekeeper *tk = &tk_core.shadow_timekeeper; 666 + struct timekeeper *tk = &tkd->shadow_timekeeper; 705 667 706 668 lockdep_assert_held(&tkd->lock); 707 669 ··· 716 678 717 679 if (action & TK_CLEAR_NTP) { 718 680 tk->ntp_error = 0; 719 - ntp_clear(); 681 + ntp_clear(tk->id); 720 682 } 721 683 722 684 tk_update_leap_state(tk); 723 685 tk_update_ktime_data(tk); 724 - 725 - update_vsyscall(tk); 726 - update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET); 727 - 728 686 tk->tkr_mono.base_real = tk->tkr_mono.base + tk->offs_real; 729 - update_fast_timekeeper(&tk->tkr_mono, &tk_fast_mono); 730 - update_fast_timekeeper(&tk->tkr_raw, &tk_fast_raw); 687 + 688 + if (tk->id == TIMEKEEPER_CORE) { 689 + update_vsyscall(tk); 690 + update_pvclock_gtod(tk, action & TK_CLOCK_WAS_SET); 691 + 692 + update_fast_timekeeper(&tk->tkr_mono, &tk_fast_mono); 693 + update_fast_timekeeper(&tk->tkr_raw, &tk_fast_raw); 694 + } else if (tk_is_aux(tk)) { 695 + vdso_time_update_aux(tk); 696 + } 731 697 732 698 if (action & TK_CLOCK_WAS_SET) 733 699 tk->clock_was_set_seq++; ··· 1017 975 EXPORT_SYMBOL_GPL(ktime_get_real_seconds); 1018 976 1019 977 /** 1020 - * __ktime_get_real_seconds - The same as ktime_get_real_seconds 1021 - * but without the sequence counter protect. This internal function 1022 - * is called just when timekeeping lock is already held. 978 + * __ktime_get_real_seconds - Unprotected access to CLOCK_REALTIME seconds 979 + * 980 + * The same as ktime_get_real_seconds() but without the sequence counter 981 + * protection. This function is used in restricted contexts like the x86 MCE 982 + * handler and in KGDB. It's unprotected on 32-bit vs. concurrent half 983 + * completed modification and only to be used for such critical contexts. 984 + * 985 + * Returns: Racy snapshot of the CLOCK_REALTIME seconds value 1023 986 */ 1024 987 noinstr time64_t __ktime_get_real_seconds(void) 1025 988 { ··· 1459 1412 } 1460 1413 EXPORT_SYMBOL(do_settimeofday64); 1461 1414 1415 + static inline bool timekeeper_is_core_tk(struct timekeeper *tk) 1416 + { 1417 + return !IS_ENABLED(CONFIG_POSIX_AUX_CLOCKS) || tk->id == TIMEKEEPER_CORE; 1418 + } 1419 + 1462 1420 /** 1463 - * timekeeping_inject_offset - Adds or subtracts from the current time. 1421 + * __timekeeping_inject_offset - Adds or subtracts from the current time. 1422 + * @tkd: Pointer to the timekeeper to modify 1464 1423 * @ts: Pointer to the timespec variable containing the offset 1465 1424 * 1466 1425 * Adds or subtracts an offset value from the current time. 1467 1426 */ 1468 - static int timekeeping_inject_offset(const struct timespec64 *ts) 1427 + static int __timekeeping_inject_offset(struct tk_data *tkd, const struct timespec64 *ts) 1469 1428 { 1429 + struct timekeeper *tks = &tkd->shadow_timekeeper; 1430 + struct timespec64 tmp; 1431 + 1470 1432 if (ts->tv_nsec < 0 || ts->tv_nsec >= NSEC_PER_SEC) 1471 1433 return -EINVAL; 1472 1434 1473 - scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 1474 - struct timekeeper *tks = &tk_core.shadow_timekeeper; 1475 - struct timespec64 tmp; 1435 + timekeeping_forward_now(tks); 1476 1436 1477 - timekeeping_forward_now(tks); 1478 - 1437 + if (timekeeper_is_core_tk(tks)) { 1479 1438 /* Make sure the proposed value is valid */ 1480 1439 tmp = timespec64_add(tk_xtime(tks), *ts); 1481 1440 if (timespec64_compare(&tks->wall_to_monotonic, ts) > 0 || 1482 1441 !timespec64_valid_settod(&tmp)) { 1483 - timekeeping_restore_shadow(&tk_core); 1442 + timekeeping_restore_shadow(tkd); 1484 1443 return -EINVAL; 1485 1444 } 1486 1445 1487 1446 tk_xtime_add(tks, ts); 1488 1447 tk_set_wall_to_mono(tks, timespec64_sub(tks->wall_to_monotonic, *ts)); 1489 - timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1448 + } else { 1449 + struct tk_read_base *tkr_mono = &tks->tkr_mono; 1450 + ktime_t now, offs; 1451 + 1452 + /* Get the current time */ 1453 + now = ktime_add_ns(tkr_mono->base, timekeeping_get_ns(tkr_mono)); 1454 + /* Add the relative offset change */ 1455 + offs = ktime_add(tks->offs_aux, timespec64_to_ktime(*ts)); 1456 + 1457 + /* Prevent that the resulting time becomes negative */ 1458 + if (ktime_add(now, offs) < 0) { 1459 + timekeeping_restore_shadow(tkd); 1460 + return -EINVAL; 1461 + } 1462 + tks->offs_aux = offs; 1490 1463 } 1491 1464 1492 - /* Signal hrtimers about time change */ 1493 - clock_was_set(CLOCK_SET_WALL); 1465 + timekeeping_update_from_shadow(tkd, TK_UPDATE_ALL); 1494 1466 return 0; 1467 + } 1468 + 1469 + static int timekeeping_inject_offset(const struct timespec64 *ts) 1470 + { 1471 + int ret; 1472 + 1473 + scoped_guard (raw_spinlock_irqsave, &tk_core.lock) 1474 + ret = __timekeeping_inject_offset(&tk_core, ts); 1475 + 1476 + /* Signal hrtimers about time change */ 1477 + if (!ret) 1478 + clock_was_set(CLOCK_SET_WALL); 1479 + return ret; 1495 1480 } 1496 1481 1497 1482 /* ··· 1601 1522 timekeeping_update_from_shadow(&tk_core, TK_UPDATE_ALL); 1602 1523 } 1603 1524 1525 + tk_aux_update_clocksource(); 1526 + 1604 1527 if (old) { 1605 1528 if (old->disable) 1606 1529 old->disable(old); ··· 1654 1573 } 1655 1574 EXPORT_SYMBOL(ktime_get_raw_ts64); 1656 1575 1576 + /** 1577 + * ktime_get_clock_ts64 - Returns time of a clock in a timespec 1578 + * @id: POSIX clock ID of the clock to read 1579 + * @ts: Pointer to the timespec64 to be set 1580 + * 1581 + * The timestamp is invalidated (@ts->sec is set to -1) if the 1582 + * clock @id is not available. 1583 + */ 1584 + void ktime_get_clock_ts64(clockid_t id, struct timespec64 *ts) 1585 + { 1586 + /* Invalidate time stamp */ 1587 + ts->tv_sec = -1; 1588 + ts->tv_nsec = 0; 1589 + 1590 + switch (id) { 1591 + case CLOCK_REALTIME: 1592 + ktime_get_real_ts64(ts); 1593 + return; 1594 + case CLOCK_MONOTONIC: 1595 + ktime_get_ts64(ts); 1596 + return; 1597 + case CLOCK_MONOTONIC_RAW: 1598 + ktime_get_raw_ts64(ts); 1599 + return; 1600 + case CLOCK_AUX ... CLOCK_AUX_LAST: 1601 + if (IS_ENABLED(CONFIG_POSIX_AUX_CLOCKS)) 1602 + ktime_get_aux_ts64(id, ts); 1603 + return; 1604 + default: 1605 + WARN_ON_ONCE(1); 1606 + } 1607 + } 1608 + EXPORT_SYMBOL_GPL(ktime_get_clock_ts64); 1657 1609 1658 1610 /** 1659 1611 * timekeeping_valid_for_hres - Check if timekeeping is suitable for hres ··· 1763 1649 *boot_offset = ns_to_timespec64(local_clock()); 1764 1650 } 1765 1651 1766 - static __init void tkd_basic_setup(struct tk_data *tkd) 1652 + static __init void tkd_basic_setup(struct tk_data *tkd, enum timekeeper_ids tk_id, bool valid) 1767 1653 { 1768 1654 raw_spin_lock_init(&tkd->lock); 1769 1655 seqcount_raw_spinlock_init(&tkd->seq, &tkd->lock); 1656 + tkd->timekeeper.id = tkd->shadow_timekeeper.id = tk_id; 1657 + tkd->timekeeper.clock_valid = tkd->shadow_timekeeper.clock_valid = valid; 1770 1658 } 1771 1659 1772 1660 /* ··· 1798 1682 struct timekeeper *tks = &tk_core.shadow_timekeeper; 1799 1683 struct clocksource *clock; 1800 1684 1801 - tkd_basic_setup(&tk_core); 1685 + tkd_basic_setup(&tk_core, TIMEKEEPER_CORE, true); 1686 + tk_aux_setup(); 1802 1687 1803 1688 read_persistent_wall_and_boot_offset(&wall_time, &boot_offset); 1804 1689 if (timespec64_valid_settod(&wall_time) && ··· 2151 2034 */ 2152 2035 static void timekeeping_adjust(struct timekeeper *tk, s64 offset) 2153 2036 { 2154 - u64 ntp_tl = ntp_tick_length(); 2037 + u64 ntp_tl = ntp_tick_length(tk->id); 2155 2038 u32 mult; 2156 2039 2157 2040 /* ··· 2232 2115 } 2233 2116 2234 2117 /* Figure out if its a leap sec and apply if needed */ 2235 - leap = second_overflow(tk->xtime_sec); 2118 + leap = second_overflow(tk->id, tk->xtime_sec); 2236 2119 if (unlikely(leap)) { 2237 2120 struct timespec64 ts; 2238 2121 ··· 2298 2181 * timekeeping_advance - Updates the timekeeper to the current time and 2299 2182 * current NTP tick length 2300 2183 */ 2301 - static bool timekeeping_advance(enum timekeeping_adv_mode mode) 2184 + static bool __timekeeping_advance(struct tk_data *tkd, enum timekeeping_adv_mode mode) 2302 2185 { 2303 - struct timekeeper *tk = &tk_core.shadow_timekeeper; 2304 - struct timekeeper *real_tk = &tk_core.timekeeper; 2186 + struct timekeeper *tk = &tkd->shadow_timekeeper; 2187 + struct timekeeper *real_tk = &tkd->timekeeper; 2305 2188 unsigned int clock_set = 0; 2306 2189 int shift = 0, maxshift; 2307 2190 u64 offset, orig_offset; 2308 - 2309 - guard(raw_spinlock_irqsave)(&tk_core.lock); 2310 2191 2311 2192 /* Make sure we're fully resumed: */ 2312 2193 if (unlikely(timekeeping_suspended)) ··· 2329 2214 shift = ilog2(offset) - ilog2(tk->cycle_interval); 2330 2215 shift = max(0, shift); 2331 2216 /* Bound shift to one less than what overflows tick_length */ 2332 - maxshift = (64 - (ilog2(ntp_tick_length())+1)) - 1; 2217 + maxshift = (64 - (ilog2(ntp_tick_length(tk->id)) + 1)) - 1; 2333 2218 shift = min(shift, maxshift); 2334 2219 while (offset >= tk->cycle_interval) { 2335 2220 offset = logarithmic_accumulation(tk, offset, shift, &clock_set); ··· 2354 2239 if (orig_offset != offset) 2355 2240 tk_update_coarse_nsecs(tk); 2356 2241 2357 - timekeeping_update_from_shadow(&tk_core, clock_set); 2242 + timekeeping_update_from_shadow(tkd, clock_set); 2358 2243 2359 2244 return !!clock_set; 2245 + } 2246 + 2247 + static bool timekeeping_advance(enum timekeeping_adv_mode mode) 2248 + { 2249 + guard(raw_spinlock_irqsave)(&tk_core.lock); 2250 + return __timekeeping_advance(&tk_core, mode); 2360 2251 } 2361 2252 2362 2253 /** 2363 2254 * update_wall_time - Uses the current clocksource to increment the wall time 2364 2255 * 2256 + * It also updates the enabled auxiliary clock timekeepers 2365 2257 */ 2366 2258 void update_wall_time(void) 2367 2259 { 2368 2260 if (timekeeping_advance(TK_ADV_TICK)) 2369 2261 clock_was_set_delayed(); 2262 + tk_aux_advance(); 2370 2263 } 2371 2264 2372 2265 /** ··· 2572 2449 /* 2573 2450 * timekeeping_validate_timex - Ensures the timex is ok for use in do_adjtimex 2574 2451 */ 2575 - static int timekeeping_validate_timex(const struct __kernel_timex *txc) 2452 + static int timekeeping_validate_timex(const struct __kernel_timex *txc, bool aux_clock) 2576 2453 { 2577 2454 if (txc->modes & ADJ_ADJTIME) { 2578 2455 /* singleshot must not be used with any other mode bits */ ··· 2631 2508 return -EINVAL; 2632 2509 } 2633 2510 2511 + if (aux_clock) { 2512 + /* Auxiliary clocks are similar to TAI and do not have leap seconds */ 2513 + if (txc->status & (STA_INS | STA_DEL)) 2514 + return -EINVAL; 2515 + 2516 + /* No TAI offset setting */ 2517 + if (txc->modes & ADJ_TAI) 2518 + return -EINVAL; 2519 + 2520 + /* No PPS support either */ 2521 + if (txc->status & (STA_PPSFREQ | STA_PPSTIME)) 2522 + return -EINVAL; 2523 + } 2524 + 2634 2525 return 0; 2635 2526 } 2636 2527 ··· 2663 2526 } 2664 2527 EXPORT_SYMBOL_GPL(random_get_entropy_fallback); 2665 2528 2529 + struct adjtimex_result { 2530 + struct audit_ntp_data ad; 2531 + struct timespec64 delta; 2532 + bool clock_set; 2533 + }; 2534 + 2535 + static int __do_adjtimex(struct tk_data *tkd, struct __kernel_timex *txc, 2536 + struct adjtimex_result *result) 2537 + { 2538 + struct timekeeper *tks = &tkd->shadow_timekeeper; 2539 + bool aux_clock = !timekeeper_is_core_tk(tks); 2540 + struct timespec64 ts; 2541 + s32 orig_tai, tai; 2542 + int ret; 2543 + 2544 + /* Validate the data before disabling interrupts */ 2545 + ret = timekeeping_validate_timex(txc, aux_clock); 2546 + if (ret) 2547 + return ret; 2548 + add_device_randomness(txc, sizeof(*txc)); 2549 + 2550 + if (!aux_clock) 2551 + ktime_get_real_ts64(&ts); 2552 + else 2553 + tk_get_aux_ts64(tkd->timekeeper.id, &ts); 2554 + 2555 + add_device_randomness(&ts, sizeof(ts)); 2556 + 2557 + guard(raw_spinlock_irqsave)(&tkd->lock); 2558 + 2559 + if (!tks->clock_valid) 2560 + return -ENODEV; 2561 + 2562 + if (txc->modes & ADJ_SETOFFSET) { 2563 + result->delta.tv_sec = txc->time.tv_sec; 2564 + result->delta.tv_nsec = txc->time.tv_usec; 2565 + if (!(txc->modes & ADJ_NANO)) 2566 + result->delta.tv_nsec *= 1000; 2567 + ret = __timekeeping_inject_offset(tkd, &result->delta); 2568 + if (ret) 2569 + return ret; 2570 + result->clock_set = true; 2571 + } 2572 + 2573 + orig_tai = tai = tks->tai_offset; 2574 + ret = ntp_adjtimex(tks->id, txc, &ts, &tai, &result->ad); 2575 + 2576 + if (tai != orig_tai) { 2577 + __timekeeping_set_tai_offset(tks, tai); 2578 + timekeeping_update_from_shadow(tkd, TK_CLOCK_WAS_SET); 2579 + result->clock_set = true; 2580 + } else { 2581 + tk_update_leap_state_all(&tk_core); 2582 + } 2583 + 2584 + /* Update the multiplier immediately if frequency was set directly */ 2585 + if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) 2586 + result->clock_set |= __timekeeping_advance(tkd, TK_ADV_FREQ); 2587 + 2588 + return ret; 2589 + } 2590 + 2666 2591 /** 2667 2592 * do_adjtimex() - Accessor function to NTP __do_adjtimex function 2668 2593 * @txc: Pointer to kernel_timex structure containing NTP parameters 2669 2594 */ 2670 2595 int do_adjtimex(struct __kernel_timex *txc) 2671 2596 { 2672 - struct audit_ntp_data ad; 2673 - bool offset_set = false; 2674 - bool clock_set = false; 2675 - struct timespec64 ts; 2597 + struct adjtimex_result result = { }; 2676 2598 int ret; 2677 2599 2678 - /* Validate the data before disabling interrupts */ 2679 - ret = timekeeping_validate_timex(txc); 2680 - if (ret) 2600 + ret = __do_adjtimex(&tk_core, txc, &result); 2601 + if (ret < 0) 2681 2602 return ret; 2682 - add_device_randomness(txc, sizeof(*txc)); 2683 2603 2684 - if (txc->modes & ADJ_SETOFFSET) { 2685 - struct timespec64 delta; 2604 + if (txc->modes & ADJ_SETOFFSET) 2605 + audit_tk_injoffset(result.delta); 2686 2606 2687 - delta.tv_sec = txc->time.tv_sec; 2688 - delta.tv_nsec = txc->time.tv_usec; 2689 - if (!(txc->modes & ADJ_NANO)) 2690 - delta.tv_nsec *= 1000; 2691 - ret = timekeeping_inject_offset(&delta); 2692 - if (ret) 2693 - return ret; 2607 + audit_ntp_log(&result.ad); 2694 2608 2695 - offset_set = delta.tv_sec != 0; 2696 - audit_tk_injoffset(delta); 2697 - } 2698 - 2699 - audit_ntp_init(&ad); 2700 - 2701 - ktime_get_real_ts64(&ts); 2702 - add_device_randomness(&ts, sizeof(ts)); 2703 - 2704 - scoped_guard (raw_spinlock_irqsave, &tk_core.lock) { 2705 - struct timekeeper *tks = &tk_core.shadow_timekeeper; 2706 - s32 orig_tai, tai; 2707 - 2708 - orig_tai = tai = tks->tai_offset; 2709 - ret = __do_adjtimex(txc, &ts, &tai, &ad); 2710 - 2711 - if (tai != orig_tai) { 2712 - __timekeeping_set_tai_offset(tks, tai); 2713 - timekeeping_update_from_shadow(&tk_core, TK_CLOCK_WAS_SET); 2714 - clock_set = true; 2715 - } else { 2716 - tk_update_leap_state_all(&tk_core); 2717 - } 2718 - } 2719 - 2720 - audit_ntp_log(&ad); 2721 - 2722 - /* Update the multiplier immediately if frequency was set directly */ 2723 - if (txc->modes & (ADJ_FREQUENCY | ADJ_TICK)) 2724 - clock_set |= timekeeping_advance(TK_ADV_FREQ); 2725 - 2726 - if (clock_set) 2609 + if (result.clock_set) 2727 2610 clock_was_set(CLOCK_SET_WALL); 2728 2611 2729 - ntp_notify_cmos_timer(offset_set); 2612 + ntp_notify_cmos_timer(result.delta.tv_sec != 0); 2730 2613 2731 2614 return ret; 2615 + } 2616 + 2617 + /* 2618 + * Invoked from NTP with the time keeper lock held, so lockless access is 2619 + * fine. 2620 + */ 2621 + long ktime_get_ntp_seconds(unsigned int id) 2622 + { 2623 + return timekeeper_data[id].timekeeper.xtime_sec; 2732 2624 } 2733 2625 2734 2626 #ifdef CONFIG_NTP_PPS ··· 2773 2607 } 2774 2608 EXPORT_SYMBOL(hardpps); 2775 2609 #endif /* CONFIG_NTP_PPS */ 2610 + 2611 + #ifdef CONFIG_POSIX_AUX_CLOCKS 2612 + #include "posix-timers.h" 2613 + 2614 + /* 2615 + * Bitmap for the activated auxiliary timekeepers to allow lockless quick 2616 + * checks in the hot paths without touching extra cache lines. If set, then 2617 + * the state of the corresponding timekeeper has to be re-checked under 2618 + * timekeeper::lock. 2619 + */ 2620 + static unsigned long aux_timekeepers; 2621 + 2622 + static inline unsigned int clockid_to_tkid(unsigned int id) 2623 + { 2624 + return TIMEKEEPER_AUX_FIRST + id - CLOCK_AUX; 2625 + } 2626 + 2627 + static inline struct tk_data *aux_get_tk_data(clockid_t id) 2628 + { 2629 + if (!clockid_aux_valid(id)) 2630 + return NULL; 2631 + return &timekeeper_data[clockid_to_tkid(id)]; 2632 + } 2633 + 2634 + /* Invoked from timekeeping after a clocksource change */ 2635 + static void tk_aux_update_clocksource(void) 2636 + { 2637 + unsigned long active = READ_ONCE(aux_timekeepers); 2638 + unsigned int id; 2639 + 2640 + for_each_set_bit(id, &active, BITS_PER_LONG) { 2641 + struct tk_data *tkd = &timekeeper_data[id + TIMEKEEPER_AUX_FIRST]; 2642 + struct timekeeper *tks = &tkd->shadow_timekeeper; 2643 + 2644 + guard(raw_spinlock_irqsave)(&tkd->lock); 2645 + if (!tks->clock_valid) 2646 + continue; 2647 + 2648 + timekeeping_forward_now(tks); 2649 + tk_setup_internals(tks, tk_core.timekeeper.tkr_mono.clock); 2650 + timekeeping_update_from_shadow(tkd, TK_UPDATE_ALL); 2651 + } 2652 + } 2653 + 2654 + static void tk_aux_advance(void) 2655 + { 2656 + unsigned long active = READ_ONCE(aux_timekeepers); 2657 + unsigned int id; 2658 + 2659 + /* Lockless quick check to avoid extra cache lines */ 2660 + for_each_set_bit(id, &active, BITS_PER_LONG) { 2661 + struct tk_data *aux_tkd = &timekeeper_data[id + TIMEKEEPER_AUX_FIRST]; 2662 + 2663 + guard(raw_spinlock)(&aux_tkd->lock); 2664 + if (aux_tkd->shadow_timekeeper.clock_valid) 2665 + __timekeeping_advance(aux_tkd, TK_ADV_TICK); 2666 + } 2667 + } 2668 + 2669 + /** 2670 + * ktime_get_aux - Get time for a AUX clock 2671 + * @id: ID of the clock to read (CLOCK_AUX...) 2672 + * @kt: Pointer to ktime_t to store the time stamp 2673 + * 2674 + * Returns: True if the timestamp is valid, false otherwise 2675 + */ 2676 + bool ktime_get_aux(clockid_t id, ktime_t *kt) 2677 + { 2678 + struct tk_data *aux_tkd = aux_get_tk_data(id); 2679 + struct timekeeper *aux_tk; 2680 + unsigned int seq; 2681 + ktime_t base; 2682 + u64 nsecs; 2683 + 2684 + WARN_ON(timekeeping_suspended); 2685 + 2686 + if (!aux_tkd) 2687 + return false; 2688 + 2689 + aux_tk = &aux_tkd->timekeeper; 2690 + do { 2691 + seq = read_seqcount_begin(&aux_tkd->seq); 2692 + if (!aux_tk->clock_valid) 2693 + return false; 2694 + 2695 + base = ktime_add(aux_tk->tkr_mono.base, aux_tk->offs_aux); 2696 + nsecs = timekeeping_get_ns(&aux_tk->tkr_mono); 2697 + } while (read_seqcount_retry(&aux_tkd->seq, seq)); 2698 + 2699 + *kt = ktime_add_ns(base, nsecs); 2700 + return true; 2701 + } 2702 + EXPORT_SYMBOL_GPL(ktime_get_aux); 2703 + 2704 + /** 2705 + * ktime_get_aux_ts64 - Get time for a AUX clock 2706 + * @id: ID of the clock to read (CLOCK_AUX...) 2707 + * @ts: Pointer to timespec64 to store the time stamp 2708 + * 2709 + * Returns: True if the timestamp is valid, false otherwise 2710 + */ 2711 + bool ktime_get_aux_ts64(clockid_t id, struct timespec64 *ts) 2712 + { 2713 + ktime_t now; 2714 + 2715 + if (!ktime_get_aux(id, &now)) 2716 + return false; 2717 + *ts = ktime_to_timespec64(now); 2718 + return true; 2719 + } 2720 + EXPORT_SYMBOL_GPL(ktime_get_aux_ts64); 2721 + 2722 + static int aux_get_res(clockid_t id, struct timespec64 *tp) 2723 + { 2724 + if (!clockid_aux_valid(id)) 2725 + return -ENODEV; 2726 + 2727 + tp->tv_sec = aux_clock_resolution_ns() / NSEC_PER_SEC; 2728 + tp->tv_nsec = aux_clock_resolution_ns() % NSEC_PER_SEC; 2729 + return 0; 2730 + } 2731 + 2732 + static int aux_get_timespec(clockid_t id, struct timespec64 *tp) 2733 + { 2734 + return ktime_get_aux_ts64(id, tp) ? 0 : -ENODEV; 2735 + } 2736 + 2737 + static int aux_clock_set(const clockid_t id, const struct timespec64 *tnew) 2738 + { 2739 + struct tk_data *aux_tkd = aux_get_tk_data(id); 2740 + struct timekeeper *aux_tks; 2741 + ktime_t tnow, nsecs; 2742 + 2743 + if (!timespec64_valid_settod(tnew)) 2744 + return -EINVAL; 2745 + if (!aux_tkd) 2746 + return -ENODEV; 2747 + 2748 + aux_tks = &aux_tkd->shadow_timekeeper; 2749 + 2750 + guard(raw_spinlock_irq)(&aux_tkd->lock); 2751 + if (!aux_tks->clock_valid) 2752 + return -ENODEV; 2753 + 2754 + /* Forward the timekeeper base time */ 2755 + timekeeping_forward_now(aux_tks); 2756 + /* 2757 + * Get the updated base time. tkr_mono.base has not been 2758 + * updated yet, so do that first. That makes the update 2759 + * in timekeeping_update_from_shadow() redundant, but 2760 + * that's harmless. After that @tnow can be calculated 2761 + * by using tkr_mono::cycle_last, which has been set 2762 + * by timekeeping_forward_now(). 2763 + */ 2764 + tk_update_ktime_data(aux_tks); 2765 + nsecs = timekeeping_cycles_to_ns(&aux_tks->tkr_mono, aux_tks->tkr_mono.cycle_last); 2766 + tnow = ktime_add(aux_tks->tkr_mono.base, nsecs); 2767 + 2768 + /* 2769 + * Calculate the new AUX offset as delta to @tnow ("monotonic"). 2770 + * That avoids all the tk::xtime back and forth conversions as 2771 + * xtime ("realtime") is not applicable for auxiliary clocks and 2772 + * kept in sync with "monotonic". 2773 + */ 2774 + aux_tks->offs_aux = ktime_sub(timespec64_to_ktime(*tnew), tnow); 2775 + 2776 + timekeeping_update_from_shadow(aux_tkd, TK_UPDATE_ALL); 2777 + return 0; 2778 + } 2779 + 2780 + static int aux_clock_adj(const clockid_t id, struct __kernel_timex *txc) 2781 + { 2782 + struct tk_data *aux_tkd = aux_get_tk_data(id); 2783 + struct adjtimex_result result = { }; 2784 + 2785 + if (!aux_tkd) 2786 + return -ENODEV; 2787 + 2788 + /* 2789 + * @result is ignored for now as there are neither hrtimers nor a 2790 + * RTC related to auxiliary clocks for now. 2791 + */ 2792 + return __do_adjtimex(aux_tkd, txc, &result); 2793 + } 2794 + 2795 + const struct k_clock clock_aux = { 2796 + .clock_getres = aux_get_res, 2797 + .clock_get_timespec = aux_get_timespec, 2798 + .clock_set = aux_clock_set, 2799 + .clock_adj = aux_clock_adj, 2800 + }; 2801 + 2802 + static void aux_clock_enable(clockid_t id) 2803 + { 2804 + struct tk_read_base *tkr_raw = &tk_core.timekeeper.tkr_raw; 2805 + struct tk_data *aux_tkd = aux_get_tk_data(id); 2806 + struct timekeeper *aux_tks = &aux_tkd->shadow_timekeeper; 2807 + 2808 + /* Prevent the core timekeeper from changing. */ 2809 + guard(raw_spinlock_irq)(&tk_core.lock); 2810 + 2811 + /* 2812 + * Setup the auxiliary clock assuming that the raw core timekeeper 2813 + * clock frequency conversion is close enough. Userspace has to 2814 + * adjust for the deviation via clock_adjtime(2). 2815 + */ 2816 + guard(raw_spinlock_nested)(&aux_tkd->lock); 2817 + 2818 + /* Remove leftovers of a previous registration */ 2819 + memset(aux_tks, 0, sizeof(*aux_tks)); 2820 + /* Restore the timekeeper id */ 2821 + aux_tks->id = aux_tkd->timekeeper.id; 2822 + /* Setup the timekeeper based on the current system clocksource */ 2823 + tk_setup_internals(aux_tks, tkr_raw->clock); 2824 + 2825 + /* Mark it valid and set it live */ 2826 + aux_tks->clock_valid = true; 2827 + timekeeping_update_from_shadow(aux_tkd, TK_UPDATE_ALL); 2828 + } 2829 + 2830 + static void aux_clock_disable(clockid_t id) 2831 + { 2832 + struct tk_data *aux_tkd = aux_get_tk_data(id); 2833 + 2834 + guard(raw_spinlock_irq)(&aux_tkd->lock); 2835 + aux_tkd->shadow_timekeeper.clock_valid = false; 2836 + timekeeping_update_from_shadow(aux_tkd, TK_UPDATE_ALL); 2837 + } 2838 + 2839 + static DEFINE_MUTEX(aux_clock_mutex); 2840 + 2841 + static ssize_t aux_clock_enable_store(struct kobject *kobj, struct kobj_attribute *attr, 2842 + const char *buf, size_t count) 2843 + { 2844 + /* Lazy atoi() as name is "0..7" */ 2845 + int id = kobj->name[0] & 0x7; 2846 + bool enable; 2847 + 2848 + if (!capable(CAP_SYS_TIME)) 2849 + return -EPERM; 2850 + 2851 + if (kstrtobool(buf, &enable) < 0) 2852 + return -EINVAL; 2853 + 2854 + guard(mutex)(&aux_clock_mutex); 2855 + if (enable == test_bit(id, &aux_timekeepers)) 2856 + return count; 2857 + 2858 + if (enable) { 2859 + aux_clock_enable(CLOCK_AUX + id); 2860 + set_bit(id, &aux_timekeepers); 2861 + } else { 2862 + aux_clock_disable(CLOCK_AUX + id); 2863 + clear_bit(id, &aux_timekeepers); 2864 + } 2865 + return count; 2866 + } 2867 + 2868 + static ssize_t aux_clock_enable_show(struct kobject *kobj, struct kobj_attribute *attr, char *buf) 2869 + { 2870 + unsigned long active = READ_ONCE(aux_timekeepers); 2871 + /* Lazy atoi() as name is "0..7" */ 2872 + int id = kobj->name[0] & 0x7; 2873 + 2874 + return sysfs_emit(buf, "%d\n", test_bit(id, &active)); 2875 + } 2876 + 2877 + static struct kobj_attribute aux_clock_enable_attr = __ATTR_RW(aux_clock_enable); 2878 + 2879 + static struct attribute *aux_clock_enable_attrs[] = { 2880 + &aux_clock_enable_attr.attr, 2881 + NULL 2882 + }; 2883 + 2884 + static const struct attribute_group aux_clock_enable_attr_group = { 2885 + .attrs = aux_clock_enable_attrs, 2886 + }; 2887 + 2888 + static int __init tk_aux_sysfs_init(void) 2889 + { 2890 + struct kobject *auxo, *tko = kobject_create_and_add("time", kernel_kobj); 2891 + 2892 + if (!tko) 2893 + return -ENOMEM; 2894 + 2895 + auxo = kobject_create_and_add("aux_clocks", tko); 2896 + if (!auxo) { 2897 + kobject_put(tko); 2898 + return -ENOMEM; 2899 + } 2900 + 2901 + for (int i = 0; i <= MAX_AUX_CLOCKS; i++) { 2902 + char id[2] = { [0] = '0' + i, }; 2903 + struct kobject *clk = kobject_create_and_add(id, auxo); 2904 + 2905 + if (!clk) 2906 + return -ENOMEM; 2907 + 2908 + int ret = sysfs_create_group(clk, &aux_clock_enable_attr_group); 2909 + 2910 + if (ret) 2911 + return ret; 2912 + } 2913 + return 0; 2914 + } 2915 + late_initcall(tk_aux_sysfs_init); 2916 + 2917 + static __init void tk_aux_setup(void) 2918 + { 2919 + for (int i = TIMEKEEPER_AUX_FIRST; i <= TIMEKEEPER_AUX_LAST; i++) 2920 + tkd_basic_setup(&timekeeper_data[i], i, false); 2921 + } 2922 + #endif /* CONFIG_POSIX_AUX_CLOCKS */
+3
kernel/time/timekeeping_internal.h
··· 45 45 unsigned long timekeeper_lock_irqsave(void); 46 46 void timekeeper_unlock_irqrestore(unsigned long flags); 47 47 48 + /* NTP specific interface to access the current seconds value */ 49 + long ktime_get_ntp_seconds(unsigned int id); 50 + 48 51 #endif /* _TIMEKEEPING_INTERNAL_H */
+55 -15
kernel/time/vsyscall.c
··· 15 15 16 16 #include "timekeeping_internal.h" 17 17 18 + static inline void fill_clock_configuration(struct vdso_clock *vc, const struct tk_read_base *base) 19 + { 20 + vc->cycle_last = base->cycle_last; 21 + #ifdef CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT 22 + vc->max_cycles = base->clock->max_cycles; 23 + #endif 24 + vc->mask = base->mask; 25 + vc->mult = base->mult; 26 + vc->shift = base->shift; 27 + } 28 + 18 29 static inline void update_vdso_time_data(struct vdso_time_data *vdata, struct timekeeper *tk) 19 30 { 20 31 struct vdso_clock *vc = vdata->clock_data; 21 32 struct vdso_timestamp *vdso_ts; 22 33 u64 nsec, sec; 23 34 24 - vc[CS_HRES_COARSE].cycle_last = tk->tkr_mono.cycle_last; 25 - #ifdef CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT 26 - vc[CS_HRES_COARSE].max_cycles = tk->tkr_mono.clock->max_cycles; 27 - #endif 28 - vc[CS_HRES_COARSE].mask = tk->tkr_mono.mask; 29 - vc[CS_HRES_COARSE].mult = tk->tkr_mono.mult; 30 - vc[CS_HRES_COARSE].shift = tk->tkr_mono.shift; 31 - vc[CS_RAW].cycle_last = tk->tkr_raw.cycle_last; 32 - #ifdef CONFIG_GENERIC_VDSO_OVERFLOW_PROTECT 33 - vc[CS_RAW].max_cycles = tk->tkr_raw.clock->max_cycles; 34 - #endif 35 - vc[CS_RAW].mask = tk->tkr_raw.mask; 36 - vc[CS_RAW].mult = tk->tkr_raw.mult; 37 - vc[CS_RAW].shift = tk->tkr_raw.shift; 35 + fill_clock_configuration(&vc[CS_HRES_COARSE], &tk->tkr_mono); 36 + fill_clock_configuration(&vc[CS_RAW], &tk->tkr_raw); 38 37 39 38 /* CLOCK_MONOTONIC */ 40 39 vdso_ts = &vc[CS_HRES_COARSE].basetime[CLOCK_MONOTONIC]; ··· 118 119 if (clock_mode != VDSO_CLOCKMODE_NONE) 119 120 update_vdso_time_data(vdata, tk); 120 121 121 - __arch_update_vsyscall(vdata); 122 + __arch_update_vdso_clock(&vc[CS_HRES_COARSE]); 123 + __arch_update_vdso_clock(&vc[CS_RAW]); 122 124 123 125 vdso_write_end(vdata); 124 126 ··· 135 135 136 136 __arch_sync_vdso_time_data(vdata); 137 137 } 138 + 139 + #ifdef CONFIG_POSIX_AUX_CLOCKS 140 + void vdso_time_update_aux(struct timekeeper *tk) 141 + { 142 + struct vdso_time_data *vdata = vdso_k_time_data; 143 + struct vdso_timestamp *vdso_ts; 144 + struct vdso_clock *vc; 145 + s32 clock_mode; 146 + u64 nsec; 147 + 148 + vc = &vdata->aux_clock_data[tk->id - TIMEKEEPER_AUX_FIRST]; 149 + vdso_ts = &vc->basetime[VDSO_BASE_AUX]; 150 + clock_mode = tk->tkr_mono.clock->vdso_clock_mode; 151 + if (!tk->clock_valid) 152 + clock_mode = VDSO_CLOCKMODE_NONE; 153 + 154 + /* copy vsyscall data */ 155 + vdso_write_begin_clock(vc); 156 + 157 + vc->clock_mode = clock_mode; 158 + 159 + if (clock_mode != VDSO_CLOCKMODE_NONE) { 160 + fill_clock_configuration(vc, &tk->tkr_mono); 161 + 162 + vdso_ts->sec = tk->xtime_sec; 163 + 164 + nsec = tk->tkr_mono.xtime_nsec >> tk->tkr_mono.shift; 165 + nsec += tk->offs_aux; 166 + vdso_ts->sec += __iter_div_u64_rem(nsec, NSEC_PER_SEC, &nsec); 167 + nsec = nsec << tk->tkr_mono.shift; 168 + vdso_ts->nsec = nsec; 169 + } 170 + 171 + __arch_update_vdso_clock(vc); 172 + 173 + vdso_write_end_clock(vc); 174 + 175 + __arch_sync_vdso_time_data(vdata); 176 + } 177 + #endif 138 178 139 179 /** 140 180 * vdso_update_begin - Start of a VDSO update section
+140 -84
lib/vdso/gettimeofday.c
··· 2 2 /* 3 3 * Generic userspace implementations of gettimeofday() and similar. 4 4 */ 5 + #include <vdso/auxclock.h> 5 6 #include <vdso/datapage.h> 6 7 #include <vdso/helpers.h> 7 8 ··· 72 71 } 73 72 #endif 74 73 74 + static __always_inline bool vdso_clockid_valid(clockid_t clock) 75 + { 76 + /* Check for negative values or invalid clocks */ 77 + return likely((u32) clock <= CLOCK_AUX_LAST); 78 + } 79 + 80 + /* 81 + * Must not be invoked within the sequence read section as a race inside 82 + * that loop could result in __iter_div_u64_rem() being extremely slow. 83 + */ 84 + static __always_inline void vdso_set_timespec(struct __kernel_timespec *ts, u64 sec, u64 ns) 85 + { 86 + ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); 87 + ts->tv_nsec = ns; 88 + } 89 + 90 + static __always_inline 91 + bool vdso_get_timestamp(const struct vdso_time_data *vd, const struct vdso_clock *vc, 92 + unsigned int clkidx, u64 *sec, u64 *ns) 93 + { 94 + const struct vdso_timestamp *vdso_ts = &vc->basetime[clkidx]; 95 + u64 cycles; 96 + 97 + if (unlikely(!vdso_clocksource_ok(vc))) 98 + return false; 99 + 100 + cycles = __arch_get_hw_counter(vc->clock_mode, vd); 101 + if (unlikely(!vdso_cycles_ok(cycles))) 102 + return false; 103 + 104 + *ns = vdso_calc_ns(vc, cycles, vdso_ts->nsec); 105 + *sec = vdso_ts->sec; 106 + 107 + return true; 108 + } 109 + 75 110 #ifdef CONFIG_TIME_NS 76 111 77 112 #ifdef CONFIG_GENERIC_VDSO_DATA_STORE ··· 119 82 #endif /* CONFIG_GENERIC_VDSO_DATA_STORE */ 120 83 121 84 static __always_inline 122 - int do_hres_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 123 - clockid_t clk, struct __kernel_timespec *ts) 85 + bool do_hres_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 86 + clockid_t clk, struct __kernel_timespec *ts) 124 87 { 125 88 const struct vdso_time_data *vd = __arch_get_vdso_u_timens_data(vdns); 126 89 const struct timens_offset *offs = &vcns->offset[clk]; 127 90 const struct vdso_clock *vc = vd->clock_data; 128 - const struct vdso_timestamp *vdso_ts; 129 - u64 cycles, ns; 130 91 u32 seq; 131 92 s64 sec; 93 + u64 ns; 132 94 133 95 if (clk != CLOCK_MONOTONIC_RAW) 134 96 vc = &vc[CS_HRES_COARSE]; 135 97 else 136 98 vc = &vc[CS_RAW]; 137 - vdso_ts = &vc->basetime[clk]; 138 99 139 100 do { 140 101 seq = vdso_read_begin(vc); 141 102 142 - if (unlikely(!vdso_clocksource_ok(vc))) 143 - return -1; 144 - 145 - cycles = __arch_get_hw_counter(vc->clock_mode, vd); 146 - if (unlikely(!vdso_cycles_ok(cycles))) 147 - return -1; 148 - ns = vdso_calc_ns(vc, cycles, vdso_ts->nsec); 149 - sec = vdso_ts->sec; 103 + if (!vdso_get_timestamp(vd, vc, clk, &sec, &ns)) 104 + return false; 150 105 } while (unlikely(vdso_read_retry(vc, seq))); 151 106 152 107 /* Add the namespace offset */ 153 108 sec += offs->sec; 154 109 ns += offs->nsec; 155 110 156 - /* 157 - * Do this outside the loop: a race inside the loop could result 158 - * in __iter_div_u64_rem() being extremely slow. 159 - */ 160 - ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); 161 - ts->tv_nsec = ns; 111 + vdso_set_timespec(ts, sec, ns); 162 112 163 - return 0; 113 + return true; 164 114 } 165 115 #else 166 116 static __always_inline ··· 157 133 } 158 134 159 135 static __always_inline 160 - int do_hres_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 161 - clockid_t clk, struct __kernel_timespec *ts) 136 + bool do_hres_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 137 + clockid_t clk, struct __kernel_timespec *ts) 162 138 { 163 - return -EINVAL; 139 + return false; 164 140 } 165 141 #endif 166 142 167 143 static __always_inline 168 - int do_hres(const struct vdso_time_data *vd, const struct vdso_clock *vc, 169 - clockid_t clk, struct __kernel_timespec *ts) 144 + bool do_hres(const struct vdso_time_data *vd, const struct vdso_clock *vc, 145 + clockid_t clk, struct __kernel_timespec *ts) 170 146 { 171 - const struct vdso_timestamp *vdso_ts = &vc->basetime[clk]; 172 - u64 cycles, sec, ns; 147 + u64 sec, ns; 173 148 u32 seq; 174 149 175 150 /* Allows to compile the high resolution parts out */ 176 151 if (!__arch_vdso_hres_capable()) 177 - return -1; 152 + return false; 178 153 179 154 do { 180 155 /* ··· 195 172 } 196 173 smp_rmb(); 197 174 198 - if (unlikely(!vdso_clocksource_ok(vc))) 199 - return -1; 200 - 201 - cycles = __arch_get_hw_counter(vc->clock_mode, vd); 202 - if (unlikely(!vdso_cycles_ok(cycles))) 203 - return -1; 204 - ns = vdso_calc_ns(vc, cycles, vdso_ts->nsec); 205 - sec = vdso_ts->sec; 175 + if (!vdso_get_timestamp(vd, vc, clk, &sec, &ns)) 176 + return false; 206 177 } while (unlikely(vdso_read_retry(vc, seq))); 207 178 208 - /* 209 - * Do this outside the loop: a race inside the loop could result 210 - * in __iter_div_u64_rem() being extremely slow. 211 - */ 212 - ts->tv_sec = sec + __iter_div_u64_rem(ns, NSEC_PER_SEC, &ns); 213 - ts->tv_nsec = ns; 179 + vdso_set_timespec(ts, sec, ns); 214 180 215 - return 0; 181 + return true; 216 182 } 217 183 218 184 #ifdef CONFIG_TIME_NS 219 185 static __always_inline 220 - int do_coarse_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 221 - clockid_t clk, struct __kernel_timespec *ts) 186 + bool do_coarse_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 187 + clockid_t clk, struct __kernel_timespec *ts) 222 188 { 223 189 const struct vdso_time_data *vd = __arch_get_vdso_u_timens_data(vdns); 224 190 const struct timens_offset *offs = &vcns->offset[clk]; ··· 229 217 sec += offs->sec; 230 218 nsec += offs->nsec; 231 219 232 - /* 233 - * Do this outside the loop: a race inside the loop could result 234 - * in __iter_div_u64_rem() being extremely slow. 235 - */ 236 - ts->tv_sec = sec + __iter_div_u64_rem(nsec, NSEC_PER_SEC, &nsec); 237 - ts->tv_nsec = nsec; 238 - return 0; 220 + vdso_set_timespec(ts, sec, nsec); 221 + 222 + return true; 239 223 } 240 224 #else 241 225 static __always_inline 242 - int do_coarse_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 243 - clockid_t clk, struct __kernel_timespec *ts) 226 + bool do_coarse_timens(const struct vdso_time_data *vdns, const struct vdso_clock *vcns, 227 + clockid_t clk, struct __kernel_timespec *ts) 244 228 { 245 - return -1; 229 + return false; 246 230 } 247 231 #endif 248 232 249 233 static __always_inline 250 - int do_coarse(const struct vdso_time_data *vd, const struct vdso_clock *vc, 251 - clockid_t clk, struct __kernel_timespec *ts) 234 + bool do_coarse(const struct vdso_time_data *vd, const struct vdso_clock *vc, 235 + clockid_t clk, struct __kernel_timespec *ts) 252 236 { 253 237 const struct vdso_timestamp *vdso_ts = &vc->basetime[clk]; 254 238 u32 seq; ··· 266 258 ts->tv_nsec = vdso_ts->nsec; 267 259 } while (unlikely(vdso_read_retry(vc, seq))); 268 260 269 - return 0; 261 + return true; 270 262 } 271 263 272 - static __always_inline int 264 + static __always_inline 265 + bool do_aux(const struct vdso_time_data *vd, clockid_t clock, struct __kernel_timespec *ts) 266 + { 267 + const struct vdso_clock *vc; 268 + u32 seq, idx; 269 + u64 sec, ns; 270 + 271 + if (!IS_ENABLED(CONFIG_POSIX_AUX_CLOCKS)) 272 + return false; 273 + 274 + idx = clock - CLOCK_AUX; 275 + vc = &vd->aux_clock_data[idx]; 276 + 277 + do { 278 + /* 279 + * Open coded function vdso_read_begin() to handle 280 + * VDSO_CLOCK_TIMENS. See comment in do_hres(). 281 + */ 282 + while ((seq = READ_ONCE(vc->seq)) & 1) { 283 + if (IS_ENABLED(CONFIG_TIME_NS) && vc->clock_mode == VDSO_CLOCKMODE_TIMENS) { 284 + vd = __arch_get_vdso_u_timens_data(vd); 285 + vc = &vd->aux_clock_data[idx]; 286 + /* Re-read from the real time data page */ 287 + continue; 288 + } 289 + cpu_relax(); 290 + } 291 + smp_rmb(); 292 + 293 + /* Auxclock disabled? */ 294 + if (vc->clock_mode == VDSO_CLOCKMODE_NONE) 295 + return false; 296 + 297 + if (!vdso_get_timestamp(vd, vc, VDSO_BASE_AUX, &sec, &ns)) 298 + return false; 299 + } while (unlikely(vdso_read_retry(vc, seq))); 300 + 301 + vdso_set_timespec(ts, sec, ns); 302 + 303 + return true; 304 + } 305 + 306 + static __always_inline bool 273 307 __cvdso_clock_gettime_common(const struct vdso_time_data *vd, clockid_t clock, 274 308 struct __kernel_timespec *ts) 275 309 { 276 310 const struct vdso_clock *vc = vd->clock_data; 277 311 u32 msk; 278 312 279 - /* Check for negative values or invalid clocks */ 280 - if (unlikely((u32) clock >= MAX_CLOCKS)) 281 - return -1; 313 + if (!vdso_clockid_valid(clock)) 314 + return false; 282 315 283 316 /* 284 317 * Convert the clockid to a bitmask and use it to check which ··· 332 283 return do_coarse(vd, &vc[CS_HRES_COARSE], clock, ts); 333 284 else if (msk & VDSO_RAW) 334 285 vc = &vc[CS_RAW]; 286 + else if (msk & VDSO_AUX) 287 + return do_aux(vd, clock, ts); 335 288 else 336 - return -1; 289 + return false; 337 290 338 291 return do_hres(vd, vc, clock, ts); 339 292 } ··· 344 293 __cvdso_clock_gettime_data(const struct vdso_time_data *vd, clockid_t clock, 345 294 struct __kernel_timespec *ts) 346 295 { 347 - int ret = __cvdso_clock_gettime_common(vd, clock, ts); 296 + bool ok; 348 297 349 - if (unlikely(ret)) 298 + ok = __cvdso_clock_gettime_common(vd, clock, ts); 299 + 300 + if (unlikely(!ok)) 350 301 return clock_gettime_fallback(clock, ts); 351 302 return 0; 352 303 } ··· 365 312 struct old_timespec32 *res) 366 313 { 367 314 struct __kernel_timespec ts; 368 - int ret; 315 + bool ok; 369 316 370 - ret = __cvdso_clock_gettime_common(vd, clock, &ts); 317 + ok = __cvdso_clock_gettime_common(vd, clock, &ts); 371 318 372 - if (unlikely(ret)) 319 + if (unlikely(!ok)) 373 320 return clock_gettime32_fallback(clock, res); 374 321 375 - /* For ret == 0 */ 322 + /* For ok == true */ 376 323 res->tv_sec = ts.tv_sec; 377 324 res->tv_nsec = ts.tv_nsec; 378 325 379 - return ret; 326 + return 0; 380 327 } 381 328 382 329 static __maybe_unused int ··· 395 342 if (likely(tv != NULL)) { 396 343 struct __kernel_timespec ts; 397 344 398 - if (do_hres(vd, &vc[CS_HRES_COARSE], CLOCK_REALTIME, &ts)) 345 + if (!do_hres(vd, &vc[CS_HRES_COARSE], CLOCK_REALTIME, &ts)) 399 346 return gettimeofday_fallback(tv, tz); 400 347 401 348 tv->tv_sec = ts.tv_sec; ··· 449 396 450 397 #ifdef VDSO_HAS_CLOCK_GETRES 451 398 static __maybe_unused 452 - int __cvdso_clock_getres_common(const struct vdso_time_data *vd, clockid_t clock, 453 - struct __kernel_timespec *res) 399 + bool __cvdso_clock_getres_common(const struct vdso_time_data *vd, clockid_t clock, 400 + struct __kernel_timespec *res) 454 401 { 455 402 const struct vdso_clock *vc = vd->clock_data; 456 403 u32 msk; 457 404 u64 ns; 458 405 459 - /* Check for negative values or invalid clocks */ 460 - if (unlikely((u32) clock >= MAX_CLOCKS)) 461 - return -1; 406 + if (!vdso_clockid_valid(clock)) 407 + return false; 462 408 463 409 if (IS_ENABLED(CONFIG_TIME_NS) && 464 410 vc->clock_mode == VDSO_CLOCKMODE_TIMENS) ··· 478 426 * Preserves the behaviour of posix_get_coarse_res(). 479 427 */ 480 428 ns = LOW_RES_NSEC; 429 + } else if (msk & VDSO_AUX) { 430 + ns = aux_clock_resolution_ns(); 481 431 } else { 482 - return -1; 432 + return false; 483 433 } 484 434 485 435 if (likely(res)) { 486 436 res->tv_sec = 0; 487 437 res->tv_nsec = ns; 488 438 } 489 - return 0; 439 + return true; 490 440 } 491 441 492 442 static __maybe_unused 493 443 int __cvdso_clock_getres_data(const struct vdso_time_data *vd, clockid_t clock, 494 444 struct __kernel_timespec *res) 495 445 { 496 - int ret = __cvdso_clock_getres_common(vd, clock, res); 446 + bool ok; 497 447 498 - if (unlikely(ret)) 448 + ok = __cvdso_clock_getres_common(vd, clock, res); 449 + 450 + if (unlikely(!ok)) 499 451 return clock_getres_fallback(clock, res); 500 452 return 0; 501 453 } ··· 516 460 struct old_timespec32 *res) 517 461 { 518 462 struct __kernel_timespec ts; 519 - int ret; 463 + bool ok; 520 464 521 - ret = __cvdso_clock_getres_common(vd, clock, &ts); 465 + ok = __cvdso_clock_getres_common(vd, clock, &ts); 522 466 523 - if (unlikely(ret)) 467 + if (unlikely(!ok)) 524 468 return clock_getres32_fallback(clock, res); 525 469 526 470 if (likely(res)) { 527 471 res->tv_sec = ts.tv_sec; 528 472 res->tv_nsec = ts.tv_nsec; 529 473 } 530 - return ret; 474 + return 0; 531 475 } 532 476 533 477 static __maybe_unused int