The aim of this study was to determine whether there is concordance between different methods of binary seasonality classification when applied to time series derived from diagnostic codes in observational data. We have used databases of different sizes, types and provenances to eliminate the possibility of discrepancies caused by the mere choice of the database. The results of this study, as shown in Figure 1, indicate that the methods are generally inconsistent with each other, with discrepancies observed in 60–80% of time series in 10 databases. As revealed in Tables 3, 4, and 5, the methods show considerable variation within the database, even when only the proportion of time series classified as seasonal is taken into account. The existence of this variation in all databases and levels of significance indicates that the source of the variation is not the data, but the methods themselves.
Sources of discord
Ultimately, the source of the discord comes from the different ways in which methods assess seasonality. Although similarities exist, each method focuses on a different aspect of a time series to assess seasonality (Table 2). For example, half of the methods (ET, AA, AR, ED) fit a time series with a hypothetical model and check the seasonality of the model, while the other half (FR, KW, WE, QS) tests different aspects of a time series. directly, without using a hypothetical model. To take the discussion further and generalize where we can, we make distinctions between types of concordance and types of peaks. In terms of concordance, we define “positive agreement” as a unanimous agreement between the methods that a time series is seasonal, while “negative agreement” is a unanimous agreement that a time series is non-seasonal. Therefore, for a given time series, the methods are discordant when there is no positive match or negative match. As for peaks, we say that peaks are “persistent” if they occur year after year, and are “consistent” if they occur in the same month. We make this distinction because peaks are related to important aspects of the analysis of time series relevant to seasonality; specifically, variation and autocorrelation. The peaks, of course, can have different sizes. Time series with large peaks suggest greater variation than those with small peaks. Persistent peaks (whether small or large) suggest the possibility of cyclical behavior underlying the time series. Consistent peaks, insofar as they are consistent, indicate autocorrelation in the time series. We will use Figs. 2 and 3 to navigate the rest of the discussion.
From Figure 3.ts1 (N= 2809) and Fig3.ts9 (N= 1498), we learn that the methods only show concordance 4307 / 11.137 = 38.7% of the time. Figure 2 provides a valuable insight into the extent of the discord between the methods. Of the 40 unique combinations, we observe that some combinations occur more frequently than others and this is due to similarities in the test procedure (Table 2). For example, methods that group time series data by month and check for differences between groups evaluate seasonality differently than methods that fit a hypothetical model and then determine seasonality by minimizing the error of forecast. Recognizing the differences in how methods assess seasonality is important not only for understanding the amount of discord observed, but also for recognizing that these differences indicate a disagreement as to how seasonality is defined. In fact, if the methods were highly concordant despite their contrasting approaches, we would have to admit that contrasting approaches are ultimately different ways of expressing the same aspect of a time series. This can be observed more clearly by exploring Figure 3. In Figure 3.ts1,…, fig.3.ts4 we observe time series that to the human eye appear seasonal and very similar. Identifying these time series as seasonal is a very old idea in time series analysis, with Beveridge [24] and Yule [25] using harmonic functions to model time series with cyclic behavior. However, despite an obvious cyclic pattern and visual similarities, Fig3.ts2, Fig3.ts3, and Fig3.ts4 show discord. The reason is that, except for the ED method, the methods are not testing the seasonality by adjusting the data with harmonic functions. Thus, different methods of assessing seasonality result in different definitions of seasonality.
As mentioned above, the behavior of peaks plays an important role in concordance. We will use Figure 3 further to explore the relationship between peaks, variation, and discord, and to provide general principles about when a method might classify a time series as seasonal rather than non-seasonal.
Positive agreement
Because each method evaluates seasonality differently, a positive agreement is only achieved when there are multiple conditions simultaneously. Persistent and consistent peaks are the most important for ED, AA, AR, and ET. Peaks will result in a seasonal classification by ED, as long as there is a sufficient difference between the peaks and the lows of the data. However, even with persistent and consistent peaks, variation (especially between peaks) over time can lead to a non-seasonal classification by AA, AR, or ET (Fig3.ts2, Fig3.ts3, and Fig3.ts4). In fact, we have confirmed experimentally that we can achieve a positive match for the time series of Fig3.ts2, Fig3.ts3 and Fig3.ts4 by deleting the data prior to 2016. Since the time series with persistent and consistent peaks will have a high correlation between seasonal delays, will be classified seasonally by QS. For FR, KW and WE, the most important thing is variation. In the absence of the prominent peaks we see in Fig3.ts1,…, Fig3.ts4, sufficient variation in the time series data can lead to FR, KW, and WE to a seasonal classification (Fig3.ts6). Therefore, in terms of positive agreement, we see tension between the methods in which variation can cause some methods to classify apparently seasonal time series as non-seasonal (Fig3.ts2, Fig3.ts3, and Fig3.ts4) and apparently not. seasonal. series as seasonal (Fig3.ts5,…, Fig3.ts8).
Negative agreement
The relationship between negative concordance and variation is more direct. The time series in Figure 3.ts5, …, Figure 3.ts9 are similar because the results of the methods cannot be determined by visual inspection alone (remember that any linear trend in each of the original series has been removed before applying the method). Given the similarity of the time series in Fig3.ts5,…, Fig3.ts9, it is reasonable to ask why they do not all show negative agreement. Ultimately, time series that are constant or stationary around a constant mean with minimal variation will result in a negative agreement between the methods. However, a time series with large peaks and variation will show a negative match if there is no monthly or annual autocorrelation (e.g., a time series generated from N (μ, σ2)). As noted in the results section, the 1498 time series for which the methods present negative agreement report an average variance of 0 to four decimal places.
Generalization and limitations
We have explained general scenarios in which we can expect a negative and positive agreement, but greater generalization is more difficult. As Figure 3 reveals, there are thousands of different combinations of discord (M = 2168,…, 1267) for each time series, making it difficult to predict which particular combination of discord is to be expected based solely on visual inspection of the time series. However, an immediate consequence of this study is that researchers using different methods are implicitly defining seasonality differently. Given the discrepancy between the methods, researchers who rely on different methods are likely to find different results, leading to a conflicting understanding of the seasonality of a time series.
Finally, we note that the study and evaluation of the methods was limited to 10 observation databases and eight methods of binary seasonality classification. Different results may have been observed by modifying one or more of the design options. As explained in the Discussion section, aspects of a time series that influence the classification of seasonality include variance, autocorrelation, maximum persistence, and maximum consistency. Time series constructed to influence one or more of these aspects could influence concordance. We have chosen 10 observation databases. Perhaps adding dozens or hundreds of other databases would reveal different levels of agreement between the methods. Similarly, we chose 8 methods of binary classification of seasonality. A different set of methods may have resulted in different levels of concordance.