Victoria Firsanova, Department of Mathematical Linguistics, Saint Petersburg State University, Saint Petersburg, Russia
In the inclusion, automated QA might become an effective tool allowing, for example, to ask questions about the interaction between neurotypical and atypical people anonymously and get reliable information immediately. However, the controllability of such systems is challenging. Before the integration of QA in the inclusion, a research is required to prevent the generation of misleading and false answers, and verify that a system is safe and does not misrepresent or alter the information. Although the problem of data misrepresentation is not new, the approach presented in the paper is novel, because it highlights a particular NLP application in the field of social policy and healthcare. The study focuses on extractive and generative QA models based on BERT and GPT-2 pre-trained Transformers, fine-tuned on a Russian dataset for the inclusion of people with autism spectrum disorder. The source code is available to GitHub: https://github.com/vifirsanova/ASD-QA.
Natural Language Processing, Question Answering, Information Extraction, BERT, GPT-2.
Zhenshan Bao, Yuezhang Wang and Wenbo Zhang, College of Computer Science, Beijing University of Technology, Beijing, China
Named entity recognition (NER) as one of the most fundamental tasks in natural language processing (NLP) has received extensive attention. Most existing approaches to NER rely on a large amount of high-quality annotations or a more complete specific entity lists. However, in practice, it is very expensive to obtain manually annotated data, and the list of entities that can be used is often not comprehensive. Using the entity list to automatically annotate data is a common annotation method, but the automatically annotated data is usually not perfect under low-resource conditions, including incomplete annotation data or non-annotated data. In this paper, we propose a NER system for complex data processing, which could use an entity list containing only a few entities to obtain incomplete annotation data, and train the NER model without human annotation. Our system extracts semantic features from a small number of samples by introducing a pretrained language model. Based on the incomplete annotations model, we relabel the data using a cross-iteration approach. We use the data filtering method to filter the training data used in the iteration process, and re-annotate the incomplete data through multiple iterations to obtain high-quality data. Each iteration will do corresponding grouping and processing according to different types of annotations, which can improve the model performance faster and reduce the number of iterations. The experimental results demonstrate that our proposed system can effectively perform low-resource NER tasks without human annotation.
Named entity recognition, Low resource natural language processing, Complex annotated data, Cross-iteration.
Renáta Nagy, Doctoral School of Health Sciences, Department of Languages for Biomedical Purposes and Communication Medical School, University of Pécs, Hungary
The presentation is about the online assessment of English for Specific Purposes. The focus is on online as a possible form of language testing. The topic is up-to-date and its main target is to uncover the intriguing question of validity of online testing. A positive outcome of the study would indicate an optimistic and dazzling future in a number of aspects for not only language assessors but for future candidates as well. Namely, a base online setup which could be used worldwide for online tests. In order to achieve this, the research involves not only the theoretical but also the real, first-hand empirical side of testing from the point of view of examiners and examinees as well. Material and methods include surveys, needs analysis and trial versions of online tests. In this context, the presentation focuses on the possible questions, techniques and approaches of the issue of online assessment which can be used in language lessons as a type of classroom technique, too.
Assessment, Online, ESP, Online assessment, validity, testing.
Yangjie Dan, Fan Xu*, Mingwen Wang, School of Computer Information Engineering, Jiangxi Normal University, Nanchang 330022, China
Dialect discrimination has an important practical significance for protecting inheritance of dialects. The traditional dialect discrimination methods pay much attention to the underlying acoustic features, and ignore the meaning of the pronunciation itself, resulting in low performance. This paper systematically explores the validity of the pronunciation features of dialect speech composed of phoneme sequence information for dialect discrimination, and designs an end-to-end dialect discrimination model based on the multi-head self-attention mechanism. Specifically, we first adopt the residual convolution neural network and the multi-head self-attention mechanism to effectively extract the phoneme sequence features unique to different dialects to compose the novel phonetic features. Then, we perform dialect discrimination based on the extracted phonetic features using the self-attention mechanism and bidirectional long short-term memory networks. The experimental results on the large-scale benchmark 10- way Chinese dialect corpus released by iFLYTEK show that our model outperforms the state-of-the-art alternatives by large margin.
Dialect discrimination, Multi-head attention mechanism, Phonetic sequence, Connectionist temporal classification.
Siddhant Hosalikar1, Saikumar Iyer1, Ankit Limbasiya1 and Prof. Suvarna Chaure2, 1SIES Graduate School of Technology, Mumbai University, India, 2Department of Computer Engineering, Mumbai University, India
Phishing is a type of fraud, in which two actors, attacker and victim take part. The role of attacker is to create a phishing webpage by mimicking as an authorized one and embed the website in an URL or any other media. Detecting malicious URLs (Uniform Resource Locators) is difficult, yet interesting topic because attackers mostly generate the URLs randomly and researchers have to detect them while considering the behaviours behind the generated Malicious URLs. There are various detection schemes exist in anti-phishing area, URL-based scheme is safer and more realistic because of most important perspective: it does not require access to malicious webpage. In this paper, our aim is to provide a comprehensive investigation on detection of Malicious URLs by using Machine Learning algorithms. So, our proposed detection system consists of feature extraction of URLs, algorithms and bigdata technology.
URL, Malicious URL detection, Feature extraction, Machine learning.
TEMITOPE O AWODIJI, Computer Information Science Personnel, California Miramar University, California, USA
Based on Information and Communication Technologies (ICT) fast advancement and the integration of advanced analytics into manufacturing, products, and services, several industries face new opportunities and at the identical time challenges of maintaining their ability and market desires. Such integration, that is termed Cyber-physical Systems (CPS), is remodeling the industry into a future level. CPS facilitates the systematic transformation of large data into information, that makes the invisible patterns of degradations and inefficiencies visible and yields to better decision-making. This project focuses on existing trends within the development of industrial huge information analytics and cps. Then it, in brief, discusses a system architecture for applying cps in manufacturing referred to as 5C. The 5C architecture, comprises necessary steps to totally integrate cyber-physical systems within the manufacturing industry.
Information and Communication Technologies (ICT), Big Data, Analytic, Data, Data Architecture.
Arthur Yosef1, Eli Shnaider2 and Moti Schneider2, 1Tel Aviv-Yaffo Academic College, Israel, 2Netanya Academic College, Israel
This study presents a method to assign relative weights when constructing Fuzzy Cognitive Maps (FCMs). We introduce a method of computing relative weights of directed edges based on actual past behavior (historical data) of the relevant concepts. There is also a discussion addressing the role of experts in the process of constructing FCMs. The method presented here is intuitive, and does not require any restrictive assumptions. The weights are estimated during the design stage of FCM and before the recursive simulations are performed.
FCM, relative importance (weight), Fuzzy Logic, Soft Computing, Neural Networks.
Abdul Musavvir Parappathiyil, Department of Mathematics, Pondicherry University, India
In this article, a linear pentagonal fuzzy number (PFN) is defined. The symmetrical and non-symmetrical PFN pertaining to linear PFN are also defined here. Some basic arithmetic operations such as addition and multiplication of linear PFNs are mentioned here. Moreover, the concept of classical two-dimensional (2-D) pentagonal fuzzy number matrices (PFMs) are also mentioned. In addition, the notion of multidimensional of pentagonal fuzzy number matrices (MDPFMs) is also discussed along with some of its rules and operations like multiplication. Finally, in the light of all rules relating to both 2-D and MDPFMs, we take use of the concept of MDPFMs to solve the fully fuzzy linear system equation (FFLSE) with pentagonal fuzzy numbers as inputs. Two of the methods like singular value decomposition (SVD) method and row reduced echelon (RRE) method are also discussed to solve FFLSE with a numerical example.
MDPFMs, FFLSE for MDPFMs with RRE method, FFLSE for MDPFMs with SVD method.
Valerie Cross and Mike Zmuda, Computer Science and Software Engineering, Miami University, Oxford, OH USA
Current machine learning research is addressing the problem that occurs when the data set includes numerous features but the number of training data is small. Microarray data, for example, typically has a very large number of features, the genes, as compared to the number of training data examples, the patients. An important research problem is to develop techniques to effectively reduce the number of features by selecting the best set of features for use in a machine learning process, referred to as the feature selection problem. Another means of addressing high dimensional data is the use of an ensemble of base classifiers. Ensembles have been shown to improve on the predictive performance of a single model by training multiple models and combining their predictions. This paper examines combining an enhancement of the random subspace model of feature selection using fuzzy set similarity measures with different measures of evaluating feature subsets in the construction of an ensemble classifier. Experimental results show potentially useful combinations.
Feature selection, fuzzy set similarity measures, concordance correlation coefficient, feature subset evaluators, microarray data, ensemble learning.
Zhijun Chen, Department of Financial Engineering, SUSTech University, Shen Zhen, China
Sentiments are extracted from tweets with the hashtag of cryptocurrencies to predict the price and sentiment prediction model generates the parameters for optimization procedure to make decision and re-allocate the portfolio in the further step. Moreover, after the process of prediction, the evaluation, which is conducted with RMSE, MAE and R2, select the KNN and CART model for the prediction of Bitcoin and Ethereum respectively. During the process of portfolio optimization, this project is trying to use predictive prescription to robust the uncertainty and meanwhile take full advantages of auxiliary data such as sentiments. For the outcome of optimization, the portfolio allocation and returns fluctuate acutely as the illustration of figure.
Cryptocurrency Trading Portfolio, Sentiment Analysis, Machine Learning, Predictive Prescription, Robust Optimization Portfolio.
K. Abidi and K. Smaili, Loria University of Lorraine, France
In this article, we tackle the issue of sentiment analysis of three Maghrebi dialects used in social networks. More precisely, we are interested by analysing sentiments in Algerian, Moroccan and Tunisian corpora. To do this, we built automatically three lexicons of sentiments, one for each dialect. Each entry of these lexicons is composed by a word, written in Arabic script (Modern Standard Arabic or dialect) or Latin script (Arabizi, French or English) with its polarity. In these lexicons, the semantic orientation of a word represented by an embedding vector is determined automatically by calculating its distance with several embedding seed words. The embedding vectors are trained on three large corpora collected from YouTube. In the experimental session, the proposed approach is evaluated by using few existing annotated corpora for Tunisian and Moroccan dialects. For the Algerian dialect, in addition to a small corpus we found in the literature, we collected and annotated a corpus of 10k comments extracted from YouTube. This corpus represents a valuable resource which will be proposed for free to the community.
Maghrebi dialect, Word embedding, Orientation semantic.
Guangjie Li, Yi Tang, Biyi Yi, Xiang Zhang and Yan He, National Innovation Institute of Defense Technology, Beijing, China
Code completion is one of the most useful features provided by advanced IDEs and is widely used by software developers. However, as a kind of code completion, recommending arguments for method calls is less used. Most of existing argument recommendation approaches provide a long list of syntactically correct candidate arguments, which is difficult for software engineers to select the correct arguments from the long list. To this end, we propose a deep learning based approach to recommending arguments instantly when programmers type in method names they intend to invoke. First, we extract context information from a large corpus of open-source applications. Second, we preprocess the extracted dataset, which involves natural language processing and data embedding. Third, we feed the preprocessed dataset to a specially designed convolutional neural network to rank and recommend actual arguments. With the resulting CNN model trained with sample applications, we can sort the candidate arguments in a reasonable order and recommend the first one as the correct argument. We evaluate the proposed approach on 100 open-source Java applications. Results suggest that the proposed approach outperforms the state-of-the-art approaches in recommending arguments.
Argument recommendation, Code Completion, CNN, Deep Learning.
Copyright © CSITY 2021