Victoria Firsanova, Department of Mathematical Linguistics, Saint Petersburg State University, Saint Petersburg, Russia
In the inclusion, automated QA might become an effective tool allowing, for example, to ask questions about the interaction between neurotypical and atypical people anonymously and get reliable information immediately. However, the controllability of such systems is challenging. Before the integration of QA in the inclusion, a research is required to prevent the generation of misleading and false answers, and verify that a system is safe and does not misrepresent or alter the information. Although the problem of data misrepresentation is not new, the approach presented in the paper is novel, because it highlights a particular NLP application in the field of social policy and healthcare. The study focuses on extractive and generative QA models based on BERT and GPT-2 pre-trained Transformers, fine-tuned on a Russian dataset for the inclusion of people with autism spectrum disorder. The source code is available to GitHub: https://github.com/vifirsanova/ASD-QA.
Natural Language Processing, Question Answering, Information Extraction, BERT, GPT-2.
Zhenshan Bao, Yuezhang Wang and Wenbo Zhang, College of Computer Science, Beijing University of Technology, Beijing, China
Named entity recognition (NER) as one of the most fundamental tasks in natural language processing (NLP) has received extensive attention. Most existing approaches to NER rely on a large amount of high-quality annotations or a more complete specific entity lists. However, in practice, it is very expensive to obtain manually annotated data, and the list of entities that can be used is often not comprehensive. Using the entity list to automatically annotate data is a common annotation method, but the automatically annotated data is usually not perfect under low-resource conditions, including incomplete annotation data or non-annotated data. In this paper, we propose a NER system for complex data processing, which could use an entity list containing only a few entities to obtain incomplete annotation data, and train the NER model without human annotation. Our system extracts semantic features from a small number of samples by introducing a pretrained language model. Based on the incomplete annotations model, we relabel the data using a cross-iteration approach. We use the data filtering method to filter the training data used in the iteration process, and re-annotate the incomplete data through multiple iterations to obtain high-quality data. Each iteration will do corresponding grouping and processing according to different types of annotations, which can improve the model performance faster and reduce the number of iterations. The experimental results demonstrate that our proposed system can effectively perform low-resource NER tasks without human annotation.
Named entity recognition, Low resource natural language processing, Complex annotated data, Cross-iteration.
Renáta Nagy, Doctoral School of Health Sciences, Department of Languages for Biomedical Purposes and Communication Medical School, University of Pécs, Hungary
The presentation is about the online assessment of English for Specific Purposes. The focus is on online as a possible form of language testing. The topic is up-to-date and its main target is to uncover the intriguing question of validity of online testing. A positive outcome of the study would indicate an optimistic and dazzling future in a number of aspects for not only language assessors but for future candidates as well. Namely, a base online setup which could be used worldwide for online tests. In order to achieve this, the research involves not only the theoretical but also the real, first-hand empirical side of testing from the point of view of examiners and examinees as well. Material and methods include surveys, needs analysis and trial versions of online tests. In this context, the presentation focuses on the possible questions, techniques and approaches of the issue of online assessment which can be used in language lessons as a type of classroom technique, too.
Assessment, Online, ESP, Online assessment, validity, testing.
Yangjie Dan, Fan Xu*, Mingwen Wang, School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China
Dialect discrimination has an important practical significance for protecting inheritance of dialects. The traditional dialect discrimination methods pay much attention to the underlying acoustic features, and ignore the meaning of the pronunciation itself, resulting in low performance. This paper systematically explores the validity of the pronunciation features of dialect speech composed of phoneme sequence information for dialect discrimination, and designs an end-to-end dialect discrimination model based on the multi-head self-attention mechanism. Specifically, we first adopt the residual convolution neural network and the multi-head self-attention mechanism to effectively extract the phoneme sequence features unique to different dialects to compose the novel phonetic features. Then, we perform dialect discrimination based on the extracted phonetic features using the self-attention mechanism and bidirectional long short-term memory networks. The experimental results on the large-scale benchmark 10- way Chinese dialect corpus released by iFLYTEK show that our model outperforms the state-of-the-art alternatives by large margin.
Dialect discrimination, Multi-head attention mechanism, Phonetic sequence, Connectionist temporal classification.
Arthur Yosef1, Eli Shnaider2 and Moti Schneider2, 1Tel Aviv-Yaffo Academic College, Israel, 2Netanya Academic College, Israel
This study presents a method to assign relative weights when constructing Fuzzy Cognitive Maps (FCMs). We introduce a method of computing relative weights of directed edges based on actual past behavior (historical data) of the relevant concepts. There is also a discussion addressing the role of experts in the process of constructing FCMs. The method presented here is intuitive, and does not require any restrictive assumptions. The weights are estimated during the design stage of FCM and before the recursive simulations are performed.
FCM, relative importance (weight), Fuzzy Logic, Soft Computing, Neural Networks.
Abdul Musavvir Parappathiyil, Department of Mathematics, Pondicherry University, India
In this article, a linear pentagonal fuzzy number (PFN) is defined. The symmetrical and non-symmetrical PFN pertaining to linear PFN are also defined here. Some basic arithmetic operations such as addition and multiplication of linear PFNs are mentioned here. Moreover, the concept of classical two-dimensional (2-D) pentagonal fuzzy number matrices (PFMs) are also mentioned. In addition, the notion of multidimensional of pentagonal fuzzy number matrices (MDPFMs) is also discussed along with some of its rules and operations like multiplication. Finally, in the light of all rules relating to both 2-D and MDPFMs, we take use of the concept of MDPFMs to solve the fully fuzzy linear system equation (FFLSE) with pentagonal fuzzy numbers as inputs. Two of the methods like singular value decomposition (SVD) method and row reduced echelon (RRE) method are also discussed to solve FFLSE with a numerical example.
MDPFMs, FFLSE for MDPFMs with RRE method, FFLSE for MDPFMs with SVD method.
Zhijun Chen, Department of Financial Engineering, SUSTech University, Shen Zhen, China
Sentiments are extracted from tweets with the hashtag of cryptocurrencies to predict the price and sentiment prediction model generates the parameters for optimization procedure to make decision and re-allocate the portfolio in the further step. Moreover, after the process of prediction, the evaluation, which is conducted with RMSE, MAE and R2, select the KNN and CART model for the prediction of Bitcoin and Ethereum respectively. During the process of portfolio optimization, this project is trying to use predictive prescription to robust the uncertainty and meanwhile take full advantages of auxiliary data such as sentiments. For the outcome of optimization, the portfolio allocation and returns fluctuate acutely as the illustration of figure.
Cryptocurrency Trading Portfolio, Sentiment Analysis, Machine Learning, Predictive Prescription, Robust Optimization Portfolio.
K. Abidi and K. Smaili, Loria University of Lorraine, France
In this article, we tackle the issue of sentiment analysis of three Maghrebi dialects used in social networks. More precisely, we are interested by analysing sentiments in Algerian, Moroccan and Tunisian corpora. To do this, we built automatically three lexicons of sentiments, one for each dialect. Each entry of these lexicons is composed by a word, written in Arabic script (Modern Standard Arabic or dialect) or Latin script (Arabizi, French or English) with its polarity. In these lexicons, the semantic orientation of a word represented by an embedding vector is determined automatically by calculating its distance with several embedding seed words. The embedding vectors are trained on three large corpora collected from YouTube. In the experimental session, the proposed approach is evaluated by using few existing annotated corpora for Tunisian and Moroccan dialects. For the Algerian dialect, in addition to a small corpus we found in the literature, we collected and annotated a corpus of 10k comments extracted from YouTube. This corpus represents a valuable resource which will be proposed for free to the community.
Maghrebi dialect, Word embedding, Orientation semantic.
Copyright © CSITY 2021