Prediction of Sustaining Emerging Technology Terms Using Burst Detection and Deep Learning

The early detection of emerging technologies is crucial for organizations to stay ahead of the technological curve and adapt to the changing market landscape. For this matter, several approaches were proposed to detect emerging technologies. However, these methods often suffer from limitations such as subjectivity because of manual curation and lack of scalability and predictability. In this extended abstract, we present a hybrid approach to detect emerging technology terms using the burst detection algorithm and predict their future trends by combining machine learning techniques with burst detection. We test the proposed approach in artificial intelligence (AI)-related patents. The results show that the proposed approach can predict the future prevalence of AI technology terms with an accuracy of 90.5%. It is also demonstrated that deploying a deep neural network classifier can increase the accuracy by 10-15% compared to a conventional tree-based classifier. We hope the proposed framework supports decision-makers in better identifying emerging terms, especially in evolving multidisciplinary fields such as AI.


Introduction
Emerging technologies play a crucial role in shaping our future by driving innovation and solving complex global challenges.The early detection of emerging technologies is critical for staying ahead of the technological curve and enabling organizations to adapt to the changing market landscape.Many attempts have been made to detect emerging terms, topics, and technologies, from lexical-based approaches that focus on term-related information [1]- [3] and bibliometric approaches that use indicators [4]- [9] to more complex approaches employing machine learning (ML) techniques [7]- [9] or even hybrid solutions [13]- [16].
The following limitations have been observed in traditional approaches to detecting emerging technologies: 1) subjectivity risk due to manual interventions, 2) lack of scalability, 3) lack of quantifiable metrics to determine the performance of the emerging technology detection process, and 4) lack of predictability or low accuracy rates for future predictions.To address these limitations, the two main objectives of this research are as follows: 1) proposing an emerging technology term detection framework that require little tuning and manual interventions, 2) testing and validating the proposed approach on a case technology field, i.e., artificial intelligence (AI), using a patent dataset.
In this work, first, the emerging technology terms in the field of AI will be detected using a burst detection algorithm.Then, the future trends of these terms will be predicted by combining ML techniques with burst detection, producing well-defined quantitative metrics for performance evaluation.The burst detection method is automated and reduces the need for manual curation and the risk of human biases.Our results show that the presented approach can successfully predict the future prevalence of the detected emerging technology terms in the evolving field of AI with an accuracy of 90.5%.The results also show that applying a deep neural network (DNN) classifier in the process of burst detection increases the accuracy of the method 10%-15% in comparison to a conventional tree-based classifier.

Burst detection
In the literature, "bursty terms" refers to terms that experience a sudden increase in usage and popularity within a certain time frame, as opposed to those that are simply consistently popular [14].The earliest methods of burst detection involved segmenting the corpus into topics and tracing changes in their popularity in different years [15].Latent Dirichlet Allocation (LDA) has become a common method for extracting topics from a collection of documents [16].Despite many advantages, it has some drawbacks such as a lack of interpretability and being hard to link LDA topics from one time step to another [17].Alternatively, bursty terms can be identified and clustered using algorithms such as Kleinberg's burst detection algorithm [18].This method has been used in a variety of contexts and applications, including Twitter and news feeds [19], [20].Kleinberg's method is not directly applicable to scientific literature, since papers are not continuously entered in batches and the change rate is annual and not real time compread to tweets or news.However, some other methods of burst detection may address this limitation and can be used on scientific documents, including those based on stock market analysis [17], [21].

Moving average convergence divergence
The Moving Average Convergence Divergence (MACD) is a technical analysis tool that uses exponential moving averages (EMAs) to smooth out stock price fluctuations and reveal underlying trends [21].In this work, we use the MACD notions described by [21].For a time span n, and a time series variable such as price of a stock (or in our case, the frequency of a term in sceientific papers) in time t as y(  ): The MACD is calculated by subtracting a long EMA from a short EMA, resulting in the MACD line (Eq.2).Long EMA covers more time spans in comparison to short EMA.
MACD( 1 ,  2 ) = EMA( 1 ) − EMA( 2 ) (2) This line is then averaged with an EMA of a third span, creating the signal line (Eq.3).The signal line is the smoothed MACD line and helps to identify buy and sell signals (or the points that the trend is changing to upward or downward) by reducing the noise and making it easier to see the trend.
Histogram is used as an indicator of price acceleration by comparing the MACD line with the signal line (Eq.4).When there is a positive trend, the histogram is positive.Therefore, a change in the histogram amount from negative to positive can be sign of changing a trend from negative to positive (Fig 1).
The MACD has been applied to analyze scientific data, such as the frequency of Medical Subject Headings (MeSH) in scientific papers instead of the stock price over time [21].A modified version of it is also used to identify bursty terms in the computer science field [17].

Data
Data were extracted from the Web of Science for patents, i.e., Derwent Innovation database, searching the following query: ("artificial intelligence" OR "deep learning" OR "machine learning") in titles/abstracts, and for the period from 2009 to 2022.We extracted the abstract and title of each patent and vectorized it based on each year.We removed English stopwords and common words using two different thesauruses.We also filtered out unrelated terms to the AI domain.

Filtering and normalization
We faced two main challenegs.First, the output terms can be noisy and include many terms that appear in very few patents.Second, because there is an upward trend in the AI domain, new terms are more prevalent, not because they are more emergent or bursty but because of the increase in the data size.To overcome these challenges, we followed the approach proposed in [17] that has two steps: 1) remove any terms that have been absent from more than 0.02% of abstracts for at least 3 successive years, and 2) normalize the frequency counts for each document twice, first by dividing their total number by each year's number of documents, then by the total number of tokens per document.The normalized frequency counts, called prevalence, was used as the main input in the following calculations.

Applying MACD
We selected a range of (3, 6, 2) as our parameters for the moving average spans ( 1 ,  2 ,  3 ).Different methods use different burstiness calculations.The raw value of the histogram was used by He and Parker [21] as a measure of burstiness, while Tattershall et al. [17] used the square root of the historical maximum prevalence as the scaled factor to calculate burstiness.We followed Tattershall et al. [17] as it produces more consistent results.The burstiness is calculated based on the prevalence of a specific term w in time t, p(w, t), as follows: [ 1 ,  2 ,  3 ]((, )) = ℎ[ 1 ,  2 ,  3 ]((, ))/ √((, )) (5)

Predicting the emergence using MACD features
The next step is to build a supervised learning model that receives the term and its features as the input and produces a label as the output.Following Tattershall et al. [17], we built the model based on MACD features.The prediction interval I indicates the prediction time window, i.e., the number of years ahead to make the prediction (e.g., 3).The algorithm works as follows, for each of the following years, y i : 1. Consider the whole set of data D(yi−13, yi).(13 is time span in our data) 2. Apply burst detection to D(yi−13, yi) and select all terms with burstiness levels above a certain level that we choose as a parameter.
3. Calculate MACD, histogram, standard deviation, minimum and maximum values as the X component (the input).4. Calculate if the smoothed value of term prevalence during yi+I is higher or lower than the prevalence during yi, as the Y component (the label).5. Update X and Y with this year's data to the previous years.

Baseline machine learning model
Prior works [17], [21], [22] used a tree-based method for predicting the popularity of clusters or terms.In a 10-fold cross-validation setting, we trained and built a random forest classifier as the baseline.We tuned the hyperparameters of the random forest model and found the best maximum depth to be 5.

Construction of a neural network classifier
We built an Multi-Layer Perceptron (MLP) neural network classifier and compared the results with the baseline random forest classifier.The reason for using an MLP classifier based on MACD feature is that time series problems such as stock market prediction and emerging technologies detection are complex, and deep learning models may work better than conventional machine learning methods since they are able to capture more complex relationships.The MLP classifier is evaluated on an unseen test set.The same training and test sets are used to evaluate the MLP and baseline classifiers.It is necessary to determine the number of layers and units per layer of a neural network.We found the best number of layers and nodes by running experiments.The best performance for the initial model was observed by having 2 layers, with 16 nodes in the first layer and 8 nodes in the second layer.

Results
Our results confirm that the burst detection method can successfully predict the future prevalence of the detected emerging technology terms with an accuracy of 90.46% with a 2layer MLP model.By applying the MLP classifier, we predicted the future prevalence of patent technology terms in 2025.Our model predicted that the prevalence of terms related to "convolutional neural networks" is falling, but terms related to "smartphones" or "deep belief networks" are rising.We also found that combining deep learning and burst detection can increase the accuracy by up to 10% to 15% in comparison to Random Forest, which can be seen in Table 1

Conclusion
The purpose of this research was to investigate and tailor a stock market-inspired algorithm for burst detection to detect emerging technology terms, predict their future prevalence in the field of AI using a patent dataset, and improve prediction accuracy through employing a neural network model.Using a random forest classifier as the baseline and an MLP classifier, we were able to predict whether terms would increase or decrease in popularity over time with an accuracy rate of 75.3% for the random forest model and 90.5% for the MLP model.We predicted the future prevalence of technology terms that can be found in abstracts or titles of patents by year 2025.This work is an attempt to combine burst detection with deep learning techniques in the context of emerging technology detection.However, this research has limitations.It was only applied to the AI patents between 2009 and 2022.The proposed approach should be further validated by applying it to other fields and data types over different time spans.As a future direction, we plan to use different data types, such as news, papers, or funding data, and in different fields.

Figure 1 .
Figure 1.MACD, Signal Line, and Histogram and how they can signal an increase in the time series variables.The graph generated using a sample random data set to demonstrate the "emergence signal" based on the the frequency of a term in sceientific papers.

Table 1 .
Machine Learning Results on the Prediction AUC of models