Day :
- Data Mining Applications in Science, Engineering, Healthcare and Medicine
Session Introduction
Jurg Ott
Rockefeller University, USA
Title: Frequent pattern mining of genotypes underlying digenic traits
Biography:
Jürg Ott has completed his PhD from the University of Zürich with postdoctoral studies in Medical Genetics at the University of Washington in Seattle, USA. He is the director of the Laboratory of Statistical Genetics at Rockefeller University, New York. He has published more than 400 papers in reputed journals and is the recipient of various prestigious awards, including the Allan Award from the American Society of Human Genetics.
Abstract:
Some genetic diseases (“digenic traits”) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa occur in the presence of two mutant variants, one each in the ROM1 and RDS genes, while occurrence of only one such variant results in a normal phenotype. Detecting digenic traits by standard genetic methods is difficult but FPM methods offer a solution. Let Y=2 refer to cases (with disease) and Y=1 to control individuals, with X denoting a specific genotype pattern. We need association rules, “X→Y”, with high confidence, P(Y=2|X), higher than the proportion P(Y=2) of cases. We use fpgrowth as the basic FPM engine and built a permutation based framework around it to find significant high-frequency digenic genotype patterns. Application to a published dataset on opioid dependency furnished results that could not have been found with classical genetic methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. Single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and in none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease.
- Data Mining and Machine Learning
Session Introduction
T. Velmurugan
D. G. Vaishnav College, India
Title: Evaluation of deep learning models for predicting the support and resistance levels in stock market
Biography:
Dr. T. Velmurugan is working as an Associate Professor in the PG and Research Department of Computer Science, D.G.Vaishnav College, Chennai, India. Also, he is the Advisor and Head, Department of Computer Applications. He holds a Ph.D. degree in Computer Science from the University of Madras and has 27 years of teaching experience. He guided more than 300 M.Phil., Research Scholars and 13 Ph.D. scholars and published more than 110 articles in SCOPUS and SCI indexed journals. He elected and served as a Senate Member from Academic Council, University of Madras and served as a nominated Senate Member in the Middle East University, Dubai, UAE and Editorial Board Member of 5 International Journals. He was an invited speaker and keynote speaker for many international conferences around the world. He is a member in Board of studies for many autonomous institutions and Universities in India. His H index is 17 and i10 index is 24.
Abstract:
Statement of the Problem: Neural Networks are Deep Learning architectures that surpass the limitations imposed by Machine Learning algorithms. Though the performance of neural networks especially Recurrent Neural Networks for predictive analysis is widely accepted by data science analysts, it still faces the issues of vanishing gradient problem. Long – Short Term Memory (LSTM) and Gated Recurrent Unit (GRU) are two kinds of modified neural networks that resolve the issue. An evaluation is done between Long – Short Term Memory and Gated Recurrent Unit to find the best fit when modeled for predicting the Support and Resistance Levels in stocks retraced by Fibonacci percentages.
Methodology & Theoretical Orientation: LSTM and GRU were modeled for two historical datasets taken from Indian stock exchange. The models trained on 4016 instances from the total instances of 5021. The rest 1005 instances were used for testing purposes. Error metrics were used to assess the level of accuracy and hyper parameters were set to determine the training process. Graphs for error residuals and time taken to train the data were ascertained as computation time.
Findings: The difference between testing and unseen data is less than the lowest value of the dependant variable for both the models by which the level of accuracy can be ascertained, validated through graphs. The set hypermeters except for one were defined by same values for both the models to achieve accuracy. The computation time for GRU is high than LSTM.
Conclusion & Significance: Both the models achieved a high level of accuracy but the residual values for LSTM were less when compared to GRU. All but one of the hyper parameters for GRU was set to a value higher than LSTM. So the computation time for GRU is greater, eventually forming a conclusion that LSTM has a better fit than GRU.
- Cloud Computing
Session Introduction
Vinod Kumar Verma
Sant Longowal Institute of Engineering and Technology, India
Title: Big Data focused information mining on pandemic era: COVID-19 analysis, challenges and impacts
Biography:
Dr. Vinod Kumar Verma is an Assistant Professor in the Department of Computer Science and Engineering at Sant Longowal Institute of Engineering and Technology, Longowal, Punjab, India. Dr Verma has published many research papers in the International Journals of IEEE, Springer, Elsevier, Taylor and Francis. Dr. Verma is also serving as Editorial Board Member / Reviewer for many International Journals like IEEE Sensor Journal, USA, IEEE Transactions on Neural Networks and Learning Systems, USA, ELSEVIER Computer Networks: The International Journal of Computer and Telecommunications Networking, SPRINGER EURASIP Journal on Wireless Communications and Networking, SPRINGER Journal of Computers in Education, The Journal of Systemics, Cybernetics, and Informatics (JSCI)", IIIS, Florida, USA, International Journal of Communication Systems. Wiley Publications, International Journal of Distributed Sensor Networks (IJDSN), SAGE Publishing, UK, Dr Verma has served many International conferences in different capacities at UK (2012), USA (2014), JAPAN (2015), ITALY (2016), AUSTRALIA (2017), FRANCE (2018).
Abstract:
In the modern era, information technology based applications significantly impact the global pandemic like COVID-19. The research areas like big data and data mining can serve as the platform for the execution of information technology based solutions. These applications affect daily life activities and assist us to deal with epidemiological situations. COVID-19 was emerged as a global pandemic in 2019. From the initiation of the COVID-19 era to the present scenario, there is huge drift in the progress of society in different sectors. These sectors include healthcare, education, transportation, military etc. Dramatic changes have been observed in the economy after the COVID-19 scenario. The steep fall in the economy of different nations has been analysed in the past. Now, the entire world is recovering from this pandemic era and economy is recovering with the time. There is a significant role of data mining and analysis to plan the forthcoming applications in order to deal with such pandemic situations. Machine learning and online analytical processing can be applied to extract the information related to pandemic like symptoms, number of patients suffered, recovery rate, death data etc. This information can only be helpful for the society if it has stored in the databases across the globe. This can only be evaluated if there remains availability of rich data from different sources in the repositories. From these repositories, information can be retrieved to deal with pandemic situations and plan for corrective and preventive measures. Data science serves as an effective way to cope up with various problematic situations like COVID-19 and can be extended to the spatial databases for extracting the information related to epidemiological diseases.Technological compliant solutions are required to handle the pandemic situations. Recommendations have been suggested to incorporate technology based applications.
Anuj Kumar
Apeejay School of Management, India
Title: Exploring adoption and usage of cloud computing in SMEs
Biography:
Prof. Anuj Kumar is currently working as an Assistant Professor at the Apeejay School of Management, Dwarka, Delhi. He is pursuing an Executive-Ph. D in management from Aligarh Muslim University (Central University). He holds a double master’s degree in management with a specialization in Marketing and International Business. He has completed M.Sc. in International Business from University College Dublin, Ireland (Ranked among top 200 colleges of world – QS World ranking). He has published research papers in reputed SCOPUS/Web of Science/ABDC/UGC Care Journals. He has more than 40 publications, one patent, three e-books, and 4 book chapters to his credit. He attended more than 30 conferences (National/International/ICSSR) and 30 FDPs. He is the associate editor for the Academy of Marketing Studies Journal (ABDC-B). He has also organized various FDPs and MDPs. Prof. Kumar has been invited as session chair, resource person, and judge in various conferences and FDPs
Abstract:
In this document, the author is discussing the adoption and importance of cloud computing from the perspective of SMEs. The accelerated growth of SMEs is required for any developed or developing nation. Technology is the only way that can help SMEs in accelerated growth with the available resources. The author will discuss the theoretical frameworks and factors responsible for adoption based on previous literature. The author will also keep a check on the major drivers of cloud computing adoption. Neves et al. (2011) argued about cloud computing in SMEs. The authors have reviewed the literature and analyze the adoption of cloud computing from the perspective of Political, Economic, Social, and Technological factors. It has been identified that political factors which are leading towards the adoption of cloud computing are data protection, favorable policies towards SMEs technology adoption, and reduction in carbon emission. The economic factors leading towards adoption are innovation, competitive advantage, flexible pricing, improvement in cost, and productivity. Social factors lead to the user-friendly nature of technology, cooperation in information technology, the modern culture is also supporting the use of cloud computing. The technological factors are flexibility, reliability, and resistance to change. Cloud computing can make SMEs more competitive, and it can also help them in improving their worth. SMEs don't have the luxury of financial resources. Those organizations need to grow with limited resources. Cloud computing is important for SMEs to achieve the desired outcome with optimum utilization of resources available. With the adoption of cloud computing, it will be easy for organizations to work with employees at remote locations, and it will also help in business continuity and flow (Carcary, et al., 2014). Most of the researchers have found the importance of cloud computing in saving the money of small firms. They also talked about its mobility, quick access anywhere, security, and overall technological infrastructure is much simpler. In recent times, its importance has been increasing a lot because it is the solution provider of SMEs' problems. Cloud computing and technology are the future. The firms realize it. They are making their employees technically skilled for the usage of cloud computing.
Hossam Elshahaby1
Cairo University, Egypt
Title: An end to end system for subtitle text extraction from movie videos
Biography:
Hossam Elshahaby is affiliated to Cairo University, Egypt. He is a recipient of many awards and grants for his valuable contributions and discoveries in major area of Artificial Intelligence. His international experience includes various programs, contributions and participation in different countries for diverse fields of study. His research interests reflect in his wide range of publications in various national and international journals in Big Data.
Abstract:
A new technique for text detection inside a complex graphical background, its extraction, and enhancement to be easily recognized using the optical character recognition (OCR). The technique uses a deep neural network for feature extraction and classifying the text as containing text or not. An Error Handling and Correction (EHC) technique is used to resolve classification errors. A Multiple Frame Integration (MFI) algorithm is introduced to extract the graphical text from its background. Text enhancement is done by adjusting the contrast, minimize noise, and increasing the pixels resolution. A standalone software Component-Off-The-Shelf (COTS) is used to recognize the text characters and qualify the system performance. Generalization for multilingual text is done with the proposed solution. A newly created dataset containing videos with different languages is collected for this purpose to be used as a benchmark. A new HMVGG16 Convolutional Neural Network (CNN) is used for frame classification as text containing or non-text containing, has accuracy equals to 98%. The introduced system weighted average caption extraction accuracy equals to 96.15%. The Correctly Detected Characters (CDC) average recognition accuracy using the Abbyy SDK OCR engine equals 97.75%.
Mekranfar Zohra
Centre des Techniques Spatiales Arzew Oran, Algeria
Title: Integrating of Multi-Criteria Decision Making (MCDM) and Spatial Data Warehouse (SDW) in GIS
Biography:
I am Engineer studies at space technology center, I got a master's degree specializing in geographic information system. I am mainly interested in data mining and spatial olap. I use different techniques of spatial information system and solap. I have ease with data warehouse, and during my work I took a lot of interest in geomatics. In addition, I have been able to use several tools, through GIS software such as ArcGis and QGIS.
Abstract:
This work aims to develop MCDM and SDW methods which will be integrated into a GIS according to a "GIS dominant" approach. The GIS operating tools will be operational to operate the SDW. The MCDM methods can provide many solutions to a set of problems with various and multiple criteria. When the problem is so complex, integrating spatial dimension, it makes sense to combine the MCDM process with other approaches like data mining , ascending analyses , we present in this paper an experiment showing a Geo-Decisional methodology of SWD construction, On Line Analytical Processing OLAP technology which combines both basic multidimensional analysis and the concepts of Data mining provides powerful tools to highlight inductions and information not obvious by traditional tools. However, these OLAP tools become more complex in the presence of the spatial dimension. The integration of OLAP with a GIS is the future geographic and spatial information solution. GIS offers advanced functions for the acquisition, storage, analysis and display of geographic information. However, their effectiveness for complex spatial analysis is questionable due to their determinism and their decisional rigor. A prerequisite for the implementation of any analysis or exploration of spatial data data requires the construction and structuring of a spatial data warehouse (SDW). This SDW must be easily usable by the GIS and by the tools offered by an OLAP system.