7th International Conference on Computer Science, Engineering and Information Technology (CSITY 2021)

September 18 ~ 19, 2021, Copenhagen, Denmark


Accepted Papers


Question Answering Systems and Inclusion: Pros and Cons

Victoria Firsanova, Department of Mathematical Linguistics, Saint Petersburg State University, Saint Petersburg, Russia

ABSTRACT

In the inclusion, automated QA might become an effective tool allowing, for example, to ask questions about the interaction between neurotypical and atypical people anonymously and get reliable information immediately. However, the controllability of such systems is challenging. Before the integration of QA in the inclusion, a research is required to prevent the generation of misleading and false answers, and verify that a system is safe and does not misrepresent or alter the information. Although the problem of data misrepresentation is not new, the approach presented in the paper is novel, because it highlights a particular NLP application in the field of social policy and healthcare. The study focuses on extractive and generative QA models based on BERT and GPT-2 pre-trained Transformers, fine-tuned on a Russian dataset for the inclusion of people with autism spectrum disorder. The source code is available to GitHub: https://github.com/vifirsanova/ASD-QA.

KEYWORDS

Natural Language Processing, Question Answering, Information Extraction, BERT, GPT-2.


Low-Resource Named Entity Recognition without Human Annotation

Zhenshan Bao, Yuezhang Wang and Wenbo Zhang, College of Computer Science, Beijing University of Technology, Beijing, China

ABSTRACT

Named entity recognition (NER) as one of the most fundamental tasks in natural language processing (NLP) has received extensive attention. Most existing approaches to NER rely on a large amount of high-quality annotations or a more complete specific entity lists. However, in practice, it is very expensive to obtain manually annotated data, and the list of entities that can be used is often not comprehensive. Using the entity list to automatically annotate data is a common annotation method, but the automatically annotated data is usually not perfect under low-resource conditions, including incomplete annotation data or non-annotated data. In this paper, we propose a NER system for complex data processing, which could use an entity list containing only a few entities to obtain incomplete annotation data, and train the NER model without human annotation. Our system extracts semantic features from a small number of samples by introducing a pretrained language model. Based on the incomplete annotations model, we relabel the data using a cross-iteration approach. We use the data filtering method to filter the training data used in the iteration process, and re-annotate the incomplete data through multiple iterations to obtain high-quality data. Each iteration will do corresponding grouping and processing according to different types of annotations, which can improve the model performance faster and reduce the number of iterations. The experimental results demonstrate that our proposed system can effectively perform low-resource NER tasks without human annotation.

KEYWORDS

Named entity recognition, Low resource natural language processing, Complex annotated data, Cross-iteration.


Online Assessment of English for Specific Purposes

Renáta Nagy, Doctoral School of Health Sciences, Department of Languages for Biomedical Purposes and Communication Medical School, University of Pécs, Hungary

ABSTRACT

The presentation is about the online assessment of English for Specific Purposes. The focus is on online as a possible form of language testing. The topic is up-to-date and its main target is to uncover the intriguing question of validity of online testing. A positive outcome of the study would indicate an optimistic and dazzling future in a number of aspects for not only language assessors but for future candidates as well. Namely, a base online setup which could be used worldwide for online tests. In order to achieve this, the research involves not only the theoretical but also the real, first-hand empirical side of testing from the point of view of examiners and examinees as well. Material and methods include surveys, needs analysis and trial versions of online tests. In this context, the presentation focuses on the possible questions, techniques and approaches of the issue of online assessment which can be used in language lessons as a type of classroom technique, too.

KEYWORDS

Assessment, Online, ESP, Online assessment, validity, testing.


End-to-End Chinese Dialect Discrimination with Self-Attention

Yangjie Dan, Fan Xu*, Mingwen Wang, School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China

ABSTRACT

Dialect discrimination has an important practical significance for protecting inheritance of dialects. The traditional dialect discrimination methods pay much attention to the underlying acoustic features, and ignore the meaning of the pronunciation itself, resulting in low performance. This paper systematically explores the validity of the pronunciation features of dialect speech composed of phoneme sequence information for dialect discrimination, and designs an end-to-end dialect discrimination model based on the multi-head self-attention mechanism. Specifically, we first adopt the residual convolution neural network and the multi-head self-attention mechanism to effectively extract the phoneme sequence features unique to different dialects to compose the novel phonetic features. Then, we perform dialect discrimination based on the extracted phonetic features using the self-attention mechanism and bidirectional long short-term memory networks. The experimental results on the large-scale benchmark 10- way Chinese dialect corpus released by iFLYTEK show that our model outperforms the state-of-the-art alternatives by large margin.

KEYWORDS

Dialect discrimination, Multi-head attention mechanism, Phonetic sequence, Connectionist temporal classification.


FCM – Computerized Calculations vs the Role of Experts

Arthur Yosef1, Eli Shnaider2 and Moti Schneider2, 1Tel Aviv-Yaffo Academic College, Israel, 2Netanya Academic College, Israel

ABSTRACT

This study presents a method to assign relative weights when constructing Fuzzy Cognitive Maps (FCMs). We introduce a method of computing relative weights of directed edges based on actual past behavior (historical data) of the relevant concepts. There is also a discussion addressing the role of experts in the process of constructing FCMs. The method presented here is intuitive, and does not require any restrictive assumptions. The weights are estimated during the design stage of FCM and before the recursive simulations are performed.

KEYWORDS

FCM, relative importance (weight), Fuzzy Logic, Soft Computing, Neural Networks.


High-Frequency Cryptocurrency Trading Strategy Using Tweet Sentiment Analysis

Zhijun Chen, Department of Financial Engineering, SUSTech University, Shen Zhen, China

ABSTRACT

Sentiments are extracted from tweets with the hashtag of cryptocurrencies to predict the price and sentiment prediction model generates the parameters for optimization procedure to make decision and re-allocate the portfolio in the further step. Moreover, after the process of prediction, the evaluation, which is conducted with RMSE, MAE and R2, select the KNN and CART model for the prediction of Bitcoin and Ethereum respectively. During the process of portfolio optimization, this project is trying to use predictive prescription to robust the uncertainty and meanwhile take full advantages of auxiliary data such as sentiments. For the outcome of optimization, the portfolio allocation and returns fluctuate acutely as the illustration of figure.

KEYWORDS

Cryptocurrency Trading Portfolio, Sentiment Analysis, Machine Learning, Predictive Prescription, Robust Optimization Portfolio.