- Smart manufacturing
- AI-Driven Business
- Privacy and Security
- Data+X Projects
Software Dependability and Security
Software dependability and security are critical in assuring the resilience of these complex systems. Despite decades of work in this area, software remains a weak link in system integrity, leading to failures that compromise safety and/or impose financial costs. The challenge posed is at once of critical importance and immense. We believe progress is best made through a new approach that focuses on mitigating the types of software bugs that are most difficult to address with conventional methods, and the team we have assembled is singularly well qualified to pursue this path. To meet the challenge, the proposed program will carry out education in the area of software faults, failures and their mitigations at development cycle and specifically during system operation. We have pioneered a course for graduate students, young researchers and software engineers studying or working in software engineering field. This new course is playing an important role in both the masters program for electrical and computer engineering and the undergraduate program for interdisciplinary data science at Duke Kunshan University.
Environment Diversity-based Software Fault Tolerance and Its Applications
Modern life depends on devices and systems containing a moderate to significant amount of software whose reliability is critical to the reliability of a system as a whole. Software fault-tolerance has hitherto been based on design diversity, and its high implementation cost has largely limited the scope of application to safety-critical systems. Affordable software fault tolerance using the newer notion of environmental diversity is being studied in this project. The key idea is predicated on the existence of elusive software faults known as environment dependent bugs or Mandelbugs with transient characteristics in their manifestation. The environment for a software system here is taken to mean the operating systems resources and other concurrently running applications. This project mainly focuses on the following four research aspects: environmental factor identification, key environmental control techniques, environmentally diversity-based fault tolerance approaches, and applications to Android systems. Research on the failure data analysis, experimental research with accelerated life testing, analytic modeling and optimization techniques of open source software is being carried out. The fruits of the research will effectively contribute to reduction of the cost of software fault tolerance, while reducing the impact of environment dependent bugs on software reliability/availability. It will also contribute to the emergence and development of the environment dependent bugs related research in software engineering.
Audio Speech and Language Processing
Prof. Ming Li and his lab conduct research in the area of Audio, Speech and Language Processing. In the 2020 calendar year, they have published more than 10 top conference or journal papers in this filed. The topics include speaker recognition, speaker diarization, speech synthesis, spoken term detection, paralinguistic speech attribute recognition, etc. They have collaborated with multiple industry leaders and local companies in terms of collaborative research and technology transfer.
Multimodal Behavior Signal Analysis
Prof. Ming Li and his lab conduct research in the area of Multimodal Behavior Signal Analysis and Interpretation towards the AI assisted Autism Spectrum Disorder (ASD) diagnose. They have developed an AI studio for the early screening of ASD. The studio’s four walls are programmable projection screens that can recreate a variety of settings, such as a forest environment, with sound delivered through multichannel audio equipment. The therapist can use the studio to interact with the child, such as asking him or her to point at a certain object projected onto the wall to observe their reaction. At the same time, cameras capture the movements of the child and the therapist, including gestures, gazes and other actions. The studio is equipped with more than 10 technologies that have obtained or are in the process of obtaining patents. These include technologies that assist with gaze detection, human pose estimation, face detection, face recognition; speech recognition and paralinguistic attribute detection.
Distributed AI Algorithm and Platform for Edge Computing
In the research project, we have designed two efficient end-to-cloud collaboration intelligence framework. The first one is PCCNN, a Convolutional Neural Network (CNN) partitioning method. It first compresses a CNN to generate new layers that can serve as candidate partitioning points, then trains prediction models to find an optimal partitioning point and splits the compressed CNN model into two parts deployed on the terminal device and the cloud, respectively.
The second one is a hierarchical data filtering framework for distributed deep neural networks, called dfDDNNs, that can avoid unnecessary transmission and cloud computing costs. Based on depthwise separable convolutions, we design a lightweight data filtering module utilized to identify and filter out the data that the cloud cannot recognize. Extensive experimental results demonstrate that the accuracy of the designed data filtering module is up to 83.18% in identifying worthless data and the proposed hierarchical data filtering distributed framework can effectively save up to 63.07% of bandwidth.
Predicting the Risk of Rupture for Vertebral Aneurysm based on geometric Features of Blood
A significant proportion of the adult population worldwide suffers from cerebral aneurysms. In this project, we investigate the possibility of using machine learning algorithms to predict rupture risk of vertebral artery fusiform aneurysms based on geometric features of the blood vessels surrounding but excluding the aneurysm. The decision tree model using two of the features (standard deviation of the eccentricity of the proximal vessel, and diameter at the distal endpoint) achieved 83.8% classification accuracy. Additionally, with support vector machine and logistic regression, we also achieved 83.8% accuracy with another set of two features (ratio of mean curvature between distal and proximal parts, and diameter at the distal endpoint). Combining the aforementioned three features with integration of curvature of the proximal vessel and also ratio of mean the cross-sectional area between distal and proximal parts, these models achieve an impressive 94.6% accuracy. These results strongly suggest the usefulness of geometric features in predicting the risk of rupture.
Data Analytics for Smart Manufacturing
To ensure high quality and yield, today’s advanced manufacturing systems are equipped with thousands of sensors to continuously collect measurement data for process monitoring, defect diagnosis and yield learning. In particular, the recent adoption of Industry 4.0 has promoted a set of enabling technologies for low-cost data sensing, processing and storage of manufacturing process. While a large amount of data has been created by the manufacturing industry, statistical algorithms, methodologies and tools are immediately needed to process the complex, heterogeneous and high-dimensional data in order to address the issues posed by process complexity, process variability and capacity constraint. The objective of this project is to explore the enormous opportunities for data analytics in the manufacturing domain and provide data-driven solutions for manufacturing cost reduction.
Digital Marketing Based on Data Analytics
In this project, close collaboration is made with leading enterprises in domestic industry. By using the sales and logistics data, we provide customers with guidance on pricing and discounts on all category of products. The project is combined with new retailing, using data-driven methodology for all aspects from production to sales, and providing advice on enterprise data management.
Intelligent Seal Recognition and Authentication Based on Deep Learning
In this project, we aim to verify the stamps on scanned voucher by comparing their images with the pre-saved (i.e., true) copies. The proposed algorithm flow is compared of two major steps: (i) extracting the binary masks for the stamp images from the scanned voucher and the true copy respectively, and (ii) comparing the resulting binary masks with consideration of environmental non-idealities such as shifting, rotation, scaling, illumination variations, background noises, etc.
To facilitate robust seal recognition and authentication, a number of novel techniques have been proposed based on deep learning. First, the problem of binary mask extraction is cast to a semantic segmentation task. By building an appropriate encoder-decoder based on convolutional neural network (CNN), improving loss function for classification and exploiting data augmentation technology, the proposed approach can accurately and efficiently extract the required binary masks for different colors and shapes in presence of large-scale illumination variations and background noises. Second, once the binary masks are available, a set of deep neural networks (DNNs) are further developed for efficient seal recognition and authentication, as shown in the following figure. In the proposed network architecture, a CNN is used for key-points detection, a graph neural network-based image registration is adopted to match the two masks from the scanned voucher and the true copy and, finally, a DNN is trained for error classification in order to generate the authentication outcome.
Two-Way Street: Cultural Exchanges Between the Chinese-Speaking World and the Portuguese- and Spanish-Speaking Worlds
The project, which combines Data Science with the Humanities, seeks to map out every single Chinese book that has been officially published in Portuguese and Spanish (be it a translation from the Chinese, or a topic that relates to the Chinese-speaking world), as well as every single Portuguese and Spanish book that has been officially published in Chinese (be it a translation from the Portuguese and the Spanish, or a topic that relates to the Portuguese- and Spanish-speaking worlds). No comprehensive studies on this topic have been carried out anywhere in the world.
The objective of the project is to map out the cultural exchanges among these three spheres and to establish a comprehensive chronology of what has been published, where, and to draw conclusions from this data.
- Liaise with National Libraries
- Follow-up on published titles with missing information
- Extract, clean, and structure data from the library catalogues
- Establish the chronology of the publications
We are currently in our first round of data extraction and seeking further cooperation from libraries in Chinese speaking regions.
The Mystery of China Innovation Quality
Technological progress propels economic growth. China’s economy has been growing dramatically with an average annual rate of 8.7 percent from 1980 to 2015. In 2019, China became the world leader in international patent applications with more than 1.4 million applications, overtaking the United States. Open but important research questions include the evolution of the quality gap in patenting innovation between China and other innovation leading countries, how it is closely related to industry and public policies directing to science and technology, and the role of the innovation network formed by inventors or assignees. This research seeks to provide some quantitative evidence on these and related questions. To this end, we have assembled a universe of patent applications from major patent offices over the globe during the 1990-2019 period. We have retrieved more than 30 million patent applications, the associated 10 billion citations and other valuable textual information.
The Chinese Factory Project: A Data Analytics and Digital History Project
The Chinese Factory Project (CFP) is a multidisciplinary data analytics and digital history project, designed to collect, analyze, and publicize archival and quantitative data sources on the industrial factories in modern China. Rooted in a wealth of primary economic and industrial data sources, including both quantitative and archival sources, the CFP is developing an original database containing a sample of up to 2000 factory cases. In the academic year 2020-2021, faculty directory Dr. Zhaojin Zeng has led a team of undergraduate students from Political Economy and Global China Studies to collect primary materials, refine digital archives, and expand the data size. An article on the factory data that represents the CFP's recent research outcomes, co-authored by Dr.Zeng and two students, is forthcoming in the leading business history journal Entreprises et Histoire.
Online Environmental Communication in China: A quantitative and qualitative analysis of internet text data
The project examines the role of the internet and social media in environmental communication in China. It aims at understanding how, on the one hand, the Chinese state has come to use the internet and social media to communicate about its policies and promote its actions in the field of environment, and, on the other hand, how Chinese people have come to use the internet to access legal information to resolve the environmental problems they face locally.
The first component of the project analyzes the communication strategy of Chinese local Environmental Protection. The project studies the on-and-offline dynamics of environmental disputes between the political actors and local environmental activists.
The second component of the project focuses on how the internet has eased access to environmental law for average Chinese people. The collected data corpus consists of almost 4,000 questions put forward by citizens to the public online legal advice platform China.findlaw.cn regarding environmental issues, and the answers posted by online lawyers in response to them. The project originally aimed at producing a combined quantitative and content analysis of the data.