Browsing by Author "Ajibola, Alim Sabur"
Now showing 1 - 2 of 2
- Results Per Page
- Sort Options
- Some of the metrics are blocked by yourconsent settings
Publication Development of stuttered speech reconstruction system(Kuala Lumpur : International Islamic University Malaysia, 2017, 2017) ;Ajibola, Alim SaburSpeech is an action in adults characterized by the production of about 14 different sounds per second via the coordinated actions of about 100 muscles connected by spinal and cranial nerves. Only 5-10% of the human population have a completely normal form of oral communication in relation to numerous speech features and healthy voice. Stuttering is defined as a disruption in the normal flow of speech unintentionally by dysfluencies, which includes repetitive pronunciation, prolonged pronunciation, blocked or stalled pronunciation at the phoneme or the syllable level. The aim of this research is to design and develop a speech reconstruction system for stuttered speech. Autoregressive (AR) and Autoregressive Moving Average (ARMA) models were used to model stuttered speech. The generally poor performance of both linear models gave the conviction to use the nonlinear model, the simplest of which is Nonlinear Autoregressive (NAR) neural network model. The Akaike Information Criteria (AIC) for the first 60 orders of the AR models was used for evaluating all the samples used. The AIC number with the lowest value is the best model that fits the signal being modeled. It was observed that there was no relationship between the types of stuttering present in each sample and the generated models. The NAR neural network models had generally the lowest Mean Square Error (MSE) values, ranging from 10-6 to 10-14. LPC reconstruction and the autoencoder neural network were used for the speech reconstruction. The effect of white noise masking on the reconstructed speech was also evaluated. The MSE between the original speech and the reconstructed speech without noise masking for all the speech samples is zero, indicating that the reconstruction was perfect with excellent quality of speech and a mirror reflection of the original speech. The MSE between the original speech and the reconstructed speech for the autoencoder neural network was between 10-2 and 10-5. The MSE of the LPC reconstruction algorithm showed a perfect quality of reconstruction while that of the autoencoder neural network showed a far from perfect reconstruction quality. The automatic speaker recognition (ASR) systems were further used to evaluate the reconstructed speech; the confusion matrices in each case. 25, 65 and 215 hidden nodes were also used for the neural networks. The multilayer perceptron (MLP) and the recurrent neural network (RNN) were used. It was however observed that the reconstructed speech without noise masking had a near perfect recognition of the speaker when using the line spectral frequency (LSF) combined with MLP using 215 hidden nodes. An offline software has also been developed for implementing the disordered speech reconstruction system. - Some of the metrics are blocked by yourconsent settings
Publication An intelligent train sound recognition system for designing level crossing control system using(Kuala Lumpur: International Islamic University Malaysia, 2012, 2012) ;Ajibola, Alim SaburA level crossing (LX) is an intersection between a railroad line and a public road, which can be either passive or automated based on the protection principle. Accidents, which include deaths and serious injuries to road users and railway passengers, which occur at level crossings are usually severe. Active control at a level crossing uses either flashing lights, bells, barrier arms, gates or a combination of these devices, while the passive control is accomplished by the provision of signs that indicate to the road users to check for the approach of trains prior to crossing the rail lines. Sound recognition is a process of identifying the source or origin of sound and is related to speech recognition. Sounds are unstructured and comparable to noise, variably composed and thus, models are difficult to build for them. There have been several studies on the control of level crossing which have focused more on the active control. The active control has also become automated and this has drastically reduced the loss of lives and properties at the level crossing. This has led to the use of intelligent systems such as fuzzy logic and expert systems. However, there is need for research into alternative means of controlling not only the level crossing gates but also the traffic lights and bell to reduce loss of lives at the level crossing. The aim of this research is to design and implement a robust system that can detect and classify unstructured sound using Artificial Neural Network for the control of level crossing. To achieve this, the samples of sounds of cars, aeroplanes, thunder, rain and train will be collected through field sampling as well as and from online sound databases. The sounds are then preprocessed and the features are extracted. Feature extraction consists of choosing the features which will most effectively preserve class separately. These extracted features will serve as input to the neural network that would be tasked with the classification process. The Mel Frequency Cepstral Coefficient and Perceptual Linear Prediction were used as feature extractors. Both Multilayer Perceptron (MLP) and Recurrent Neural Network (RNN) were utilized as the classifiers. The performance of the system was evaluated using some of the metrics used in pattern recognition such as identification accuracy, Receiver Operating Characteristics and misclassification rate. In an attempt to replicate a real life situation, the sounds were mixed together in twos and threes. For the sound of train, MLP gives a sensitivity of 96.6%, while having a misclassification rate of 7.4%, however, RNN gives a sensitivity of 70%, while having a misclassification rate of 16%. Similarly, for the sound of train+aircraft (T+A), MLP gives a sensitivity of 53.3%, while having a misclassification rate of 51.7%. However, RNN gives a sensitivity of 76.7%, while having a misclassification rate of 15.8%. Furthermore, for the sound of T+A+C, MLP gives a sensitivity of 76.7%, while having a misclassification rate of 58.9%. However, RNN gives a sensitivity of 90%, while having a misclassification rate of 10%. It has been shown by simulation that this novel level crossing control system has a very great potential that can be harnessed by the railway industry to help reduce the damage in form of loss of lives and properties that occur as a result of collusions at the railway crossing.