Проблема сравнительного анализа MFCC векторов -> Форум на Исходниках.Ру

	Наши проекты: Журнал · Discuz!ML · Wiki · DRKB · Помощь проекту

Здравствуйте, Гость ! [216.73.216.52]

Модераторы: RaD, nsh

Новое голосование

Проблема сравнительного анализа MFCC векторов! , Небольшая проблемка..

MrKor

Сообщ. #1 , 24.02.14, 17:56

Junior

Профиль · PM

Рейтинг (т): нет

Здравствуйте, форумчане!

Возникла проблема..

Записываю речевые сигналы, анализирую их программой(в основном там библиотека MFCC Java)

И получаю один и тот же результат:

-1 4 -10 6 -1 -1 3 -1 -1 2 -3 0 2

Хотя.. я анализирую целый звуковой файл, возможно поможет деление на кусочки, жаль я не очень это умею)) Но попробуем.

Добавлено 24.02.14, 18:42
Записал слова привет и москва, результаты ниже:

МОСКВА

| -1 | 4 | -10 | 6 | -1 | -1 | 3 | -1 | -1 | 2 | -3 | 0 | 2

| -1 | 5 | -10 | 7 | 0 | -1 | 4 | -1 | 0 | 3 | -2 | 0 | 3

| 0 | 5 | -10 | 5 | -2 | -3 | 0 | -7 | -4 | 1 | -1 | 1 | 1

| 0 | 6 | -9 | 7 | 0 | -2 | 3 | -1 | -1 | 2 | -4 | -2 | 1

| 0 | 3 | -11 | 1 | -4 | 2 | 5 | -1 | 0 | 1 | -5 | -2 | -1

| -1 | 4 | -10 | 6 | -1 | -1 | 3 | -1 | -1 | 3 | -3 | 0 | 2

ПРИВЕТ

| -1 | 4 | -10 | 6 | -1 | -1 | 3 | -1 | -1 | 2 | -3 | 0 | 2

| 0 | 5 | -9 | 8 | 0 | -2 | 2 | 0 | 0 | 3 | -2 | 0 | 3

| 0 | 6 | -8 | 9 | 0 | -3 | 2 | -3 | -3 | 0 | -6 | -3 | 0

| -1 | 4 | -10 | 6 | -1 | -1 | 3 | -1 | -1 | 2 | -3 | 0 | 2

В принципе мне понятно, почему такое творится одинаковые результаты это "тишина"
Каждая строчка это 0.25 секунды.Видно что в первом случае я говорил примерно 0.75 секунды, на втором половина секунды oO

Но вот почему он читает именно тишину, даю код "чтения", может быть гуру подскажут что он выдает:

String PatnToFile = "C:\\test\\music\\05L.wav";

File AudioFile = new File(PatnToFile);

ByteArrayOutputStream out = new ByteArrayOutputStream();

BufferedInputStream in;

try {

audioInputStream = AudioSystem.getAudioInputStream(AudioFile);

} catch (UnsupportedAudioFileException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

format = audioInputStream.getFormat();

DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

if (!AudioSystem.isLineSupported(info)) {

System.out.println("Error");

}

TargetDataLine line = null;

try {

line = (TargetDataLine) AudioSystem.getLine(info);

line.open(format);

} catch (LineUnavailableException ex) {

System.out.println("Error");

}

line.start();

byte[] data = new byte[W * format.getSampleSizeInBits() / BITS_IN_BYTE*2];

double[] inbuf = new double[W/2];

try {

in = new BufferedInputStream(new FileInputStream(AudioFile));

int read;

while ((read = in.read(data)) > 0) {

out.write(data, 0, read);

}

out.flush();

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

data = out.toByteArray();

decode(data, inbuf);

Где звуковой файл, собс, записанная речь.

Добавлено 24.02.14, 18:42
Заранее спасибо за ответы!

nsh

Сообщ. #2 , 25.02.14, 07:50

Moderator

Профиль · PM

Цитата

Но вот почему он читает именно тишину, даю код "чтения", может быть гуру подскажут что он выдает:

Что в файле есть, то и читает. Ничего необычного в этом нет

Цитата

В принципе мне понятно, почему такое творится одинаковые результаты это "тишина"

Если в файле нет одинаковых кусков в начале и конце, то одинаковые результаты это не тишина, а ошибка в коде.

Чтобы получить помощь по коду,нужно этот код показать.

Цитата

Каждая строчка это 0.25 секунды

Для анализа речи каждая строчка должна быть в 10 раз меньше - 0.025 секунды.

MrKor

Сообщ. #3 , 25.02.14, 09:52

Junior

Профиль · PM

Рейтинг (т): нет

Уху спасибо nsh! Делал с помощью вот этой библиотеки:
http://code.ohloh.net/file?fid=EQDSocdzPtN...elected=true#L0

Не знаю, авсоь там и есть какая то ошибочка...

Цитата nsh @ 25.02.14, 07:50

Для анализа речи каждая строчка должна быть в 10 раз меньше - 0.025 секунды.

Ого, это сколько же файлов получится....

Ведь каждый еще и прогнать надо..

Ну не знаю,вроде одинаковых моментов в звуковых файлах нет, просто милисекунды перед началом записи и перед завершением я же молчу)) Вот он это молчание анализирует и выдает одинаковые участки векторов.

nsh

Сообщ. #4 , 25.02.14, 15:37

Moderator

Профиль · PM

Цитата

Не знаю, авсоь там и есть какая то ошибочка...

Свой код надо показывать, а не код библиотеки. В библиотеках обычно ошибки есть, но не такие значительные.
Ошибки прежде всего в собственном коде нужно искать.

Файлы используемые тоже стоит выложить.

Цитата

Ого, это сколько же файлов получится....

Файлов будет столько же. Строк будет больше.

MrKor

Сообщ. #5 , 25.02.14, 18:19

Junior

Профиль · PM

Рейтинг (т): нет

Так это и есть весь код) код библиотеки + код в моем первом посте)))

А про файлы я имел в виду, что аюдасити делит когда по промежуткам, создает много файлов, если у меня с 0.25 секунд 6 файлов, то сколько же будет с 0.025...

Как это потом анализировать.. Ухх))

В любом случае спасибо вам, nsh, что тяните и помогаете))

Добавлено 25.02.14, 18:22
ОЙ, забыл, там же еще метод декоде:

public static void decode(byte[] input, double[] output) {

assert input.length == 2 * output.length;

for (int i = 0; i < output.length; i++) {

output[i] = (short) (((0xFF & input[2 * i + 1]) << 8) | (0xFF & input[2 * i]));

output[i] /= Short.MAX_VALUE;

}

Вот и все, а потом выводим ответы:

inbuf[2] = 10;

inbuf[4] = 14;

double[] dparameters = mfcc.getParameters(inbuf);

System.out.println("MFCC parameters:");

for (int i = 0; i < dparameters.length; i++) {

System.out.print(" " + dparameters[i]);

}

int[] mfccresult = new int[dparameters.length];

for(int i = 0; i < dparameters.length; i++){

mfccresult[i] = (int) dparameters[i];

}

for(int i = 0; i < mfccresult.length;i++){

System.out.print(mfccresult[i]);

}

Это весь код.

MrKor	Сообщ. #6 , 26.02.14, 14:09
Junior Профиль · PM Рейтинг (т): нет	Ну так как,nsh, что посоветуете?)

nsh

Сообщ. #7 , 27.02.14, 08:29

Moderator

Профиль · PM

Цитата

Ну так как,nsh, что посоветуете?)

Пока не видно:

1. Полный код, а не отрывки. Где определена переменная W? где создается объект mfcc?

2. Чистый код. Зачем открывать DataLine для чтения с микрофона, если можно читать из файла? Зачем странные присваивания вроде inbuf[4] = 14?

3. Исходные звуковые файлы

Посоветовать что-нибудь сложно

MrKor

Сообщ. #8 , 27.02.14, 12:10

Junior

Профиль · PM

Рейтинг (т): нет

Цитата nsh @ 27.02.14, 08:29

Зачем странные присваивания вроде inbuf[4] = 14?

Вот это уже было...

Вот весь код:

Часть 1:

import java.io.BufferedInputStream;

import java.io.ByteArrayOutputStream;

import java.io.File;

import java.io.FileInputStream;

import java.io.FileNotFoundException;

import java.io.IOException;

import javax.sound.sampled.AudioFileFormat;

import javax.sound.sampled.AudioFormat;

import javax.sound.sampled.AudioInputStream;

import javax.sound.sampled.AudioSystem;

import javax.sound.sampled.DataLine;

import javax.sound.sampled.LineUnavailableException;

import javax.sound.sampled.TargetDataLine;

import javax.sound.sampled.UnsupportedAudioFileException;

public class mfcc {

private static final boolean m_ousePowerInsteadOfMagnitude = false;

private final int m_nnumberOfParameters;

private final double m_dsamplingFrequency;

private final int m_nnumberOfFilters;

private final int m_nFFTLength;

private final int m_nlifteringCoefficient;

private final boolean m_oisLifteringEnabled;

private final double m_dminimumFilterOutput = 1.0;

private final boolean m_oisZeroThCepstralCoefficientCalculated;

private final double m_dlogFilterOutputFloor = 0.0;

private int[][] m_nboundariesDFTBins;

private double[][] m_dweights;

private fftt m_fft;

private double[][] m_ddCTMatrix;

private double[] m_dfilterOutput;

private final double[] m_nlifteringMultiplicationFactor;

private final double m_dscalingFactor;

//Тип файла

private static AudioFileFormat.Type fileType = AudioFileFormat.Type.WAVE;

private static AudioInputStream audioInputStream;

private static AudioFormat format;

//Размер массива чтения файла

final static int W = 1024;

private static int BITS_IN_BYTE = 8;

public static void main(String[] args) {

int nnumberofFilters = 24;

int nlifteringCoefficient = 22;

boolean oisLifteringEnabled = true;

boolean oisZeroThCepstralCoefficientCalculated = false;

int nnumberOfMFCCParameters = 13; //Количество коэфицентов

double dsamplingFrequency = 8000.0;

int nFFTLength = 512;

if (oisZeroThCepstralCoefficientCalculated) {

// take in account the zero-th MFCC

nnumberOfMFCCParameters = nnumberOfMFCCParameters + 1;

} else {

nnumberOfMFCCParameters = nnumberOfMFCCParameters;

}

mfcc mfcc = new mfcc(nnumberOfMFCCParameters, dsamplingFrequency,

nnumberofFilters, nFFTLength, oisLifteringEnabled,

nlifteringCoefficient, oisZeroThCepstralCoefficientCalculated);

System.out.println(mfcc.toString());

//String PatnToFile = "C:\\test\\music\\mix20" + "." + fileType;

String PatnToFile = "C:\\test\\music\\05L.wav";

File AudioFile = new File(PatnToFile);

ByteArrayOutputStream out = new ByteArrayOutputStream();

BufferedInputStream in;

try {

audioInputStream = AudioSystem.getAudioInputStream(AudioFile);

} catch (UnsupportedAudioFileException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

format = audioInputStream.getFormat();

DataLine.Info info = new DataLine.Info(TargetDataLine.class, format);

if (!AudioSystem.isLineSupported(info)) {

System.out.println("Error");

}

TargetDataLine line = null;

try {

line = (TargetDataLine) AudioSystem.getLine(info);

line.open(format);

} catch (LineUnavailableException ex) {

System.out.println("Error");

}

line.start();

byte[] data = new byte[W * format.getSampleSizeInBits() / BITS_IN_BYTE*2];

double[] inbuf = new double[W/2];

try {

in = new BufferedInputStream(new FileInputStream(AudioFile));

int read;

while ((read = in.read(data)) > 0) {

out.write(data, 0, read);

}

out.flush();

} catch (FileNotFoundException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

data = out.toByteArray();

decode(data, inbuf);

* TEST

//double[] x = new double[160];

inbuf[2] = 10;

inbuf[4] = 14;

double[] dparameters = mfcc.getParameters(inbuf);

System.out.println("MFCC parameters:");

for (int i = 0; i < dparameters.length; i++) {

System.out.print(" " + dparameters[i]);

}

int[] mfccresult = new int[dparameters.length];

for(int i = 0; i < dparameters.length; i++){

mfccresult[i] = (int) dparameters[i];

}

System.out.println("=====================");

System.out.println("Результаты: ");

for(int i = 0; i < mfccresult.length;i++){

System.out.print(" | " + mfccresult[i]);

}

* Перевод массива байтов в массив типа double

* на выходе 2048 значений (по стандарту)

public static void decode(byte[] input, double[] output) {

assert input.length == 2 * output.length;

for (int i = 0; i < output.length; i++) {

output[i] = (short) (((0xFF & input[2 * i + 1]) << 8) | (0xFF & input[2 * i]));

output[i] /= Short.MAX_VALUE;

}

public mfcc(int nnumberOfParameters, double dsamplingFrequency,

int nnumberofFilters, int nFFTLength, boolean oisLifteringEnabled,

int nlifteringCoefficient,

boolean oisZeroThCepstralCoefficientCalculated) {

m_oisZeroThCepstralCoefficientCalculated = oisZeroThCepstralCoefficientCalculated;

if (m_oisZeroThCepstralCoefficientCalculated) {

m_nnumberOfParameters = nnumberOfParameters - 1;

} else {

m_nnumberOfParameters = nnumberOfParameters;

}

m_dsamplingFrequency = dsamplingFrequency;

m_nnumberOfFilters = nnumberofFilters;

m_nFFTLength = nFFTLength;

calculateMelBasedFilterBank(dsamplingFrequency, nnumberofFilters,

nFFTLength);

m_fft = new fftt(m_nFFTLength); // initialize FFT

initializeDCTMatrix();

m_nlifteringCoefficient = nlifteringCoefficient;

m_oisLifteringEnabled = oisLifteringEnabled;

m_dfilterOutput = new double[m_nnumberOfFilters];

m_dscalingFactor = Math.sqrt(2.0 / m_nnumberOfFilters);

if (m_oisLifteringEnabled) {

int nnumberOfCoefficientsToLift = m_nnumberOfParameters;

m_nlifteringMultiplicationFactor = new double[m_nlifteringCoefficient];

double dfactor = m_nlifteringCoefficient / 2.0;

double dfactor2 = Math.PI / m_nlifteringCoefficient;

for (int i = 0; i < m_nlifteringCoefficient; i++) {

m_nlifteringMultiplicationFactor[i] = 1.0 + dfactor

* Math.sin(dfactor2 * (i + 1));

}

if (m_nnumberOfParameters > m_nlifteringCoefficient) {

new Error(

"Liftering is enabled and the number "

+ "of parameters = "

+ m_nnumberOfParameters

+ ", while "

+ "the liftering coefficient is "

+ m_nlifteringCoefficient

+ ". In this case some cepstrum coefficients would be made "

+ "equal to zero due to liftering, what does not make much "

+ "sense in a speech recognition system. You may want to "

+ "increase the liftering coefficient or decrease the number "

+ "of MFCC parameters.");

}

} else {

m_nlifteringMultiplicationFactor = null;

}

/** Initializes the DCT matrix. */

private void initializeDCTMatrix() {

m_ddCTMatrix = new double[m_nnumberOfParameters][m_nnumberOfFilters];

for (int i = 0; i < m_nnumberOfParameters; i++) {

for (int j = 0; j < m_nnumberOfFilters; j++) {

m_ddCTMatrix[i][j] = Math.cos((i + 1.0) * (j + 1.0 - 0.5)

* (Math.PI / m_nnumberOfFilters));

}

public static double[] convertHzToMel(double[] dhzFrequencies,

double dsamplingFrequency) {

double[] dmelFrequencies = new double[dhzFrequencies.length];

for (int k = 0; k < dhzFrequencies.length; k++) {

dmelFrequencies[k] = 2595.0 * (Math

.log(1.0 + (dhzFrequencies[k] / 700.0)) / Math.log(10));

}

return dmelFrequencies;

}

private void calculateMelBasedFilterBank(double dsamplingFrequency,

int nnumberofFilters, int nfftLength) {

// frequencies for each triangular filter

double[][] dfrequenciesInMelScale = new double[nnumberofFilters][3];

// the +1 below is due to the sample of frequency pi (or fs/2)

double[] dfftFrequenciesInHz = new double[nfftLength / 2 + 1];

// compute the frequency of each FFT sample (in Hz):

double ddeltaFrequency = dsamplingFrequency / nfftLength;

for (int i = 0; i < dfftFrequenciesInHz.length; i++) {

dfftFrequenciesInHz[i] = i * ddeltaFrequency;

}

// convert Hz to Mel

double[] dfftFrequenciesInMel = this.convertHzToMel(

dfftFrequenciesInHz, dsamplingFrequency);

// compute the center frequencies. Notice that 2 filters are

// "artificially" created in the endpoints of the frequency

// scale, correspondent to 0 and fs/2 Hz.

double[] dfilterCenterFrequencies = new double[nnumberofFilters + 2];

// implicitly: dfilterCenterFrequencies[0] = 0.0;

ddeltaFrequency = dfftFrequenciesInMel[dfftFrequenciesInMel.length - 1]

/ (nnumberofFilters + 1);

for (int i = 1; i < dfilterCenterFrequencies.length; i++) {

dfilterCenterFrequencies[i] = i * ddeltaFrequency;

}

// initialize member variables

m_nboundariesDFTBins = new int[m_nnumberOfFilters][2];

m_dweights = new double[m_nnumberOfFilters][];

// notice the loop starts from the filter i=1 because i=0 is the one

// centered at DC

for (int i = 1; i <= nnumberofFilters; i++) {

m_nboundariesDFTBins[i - 1][0] = Integer.MAX_VALUE;

// notice the loop below doesn't include the first and last FFT

// samples

for (int j = 1; j < dfftFrequenciesInMel.length - 1; j++) {

// see if frequency j is inside the bandwidth of filter i

if ((dfftFrequenciesInMel[j] >= dfilterCenterFrequencies[i - 1])

& (dfftFrequenciesInMel[j] <= dfilterCenterFrequencies[i + 1])) {

// the i-1 below is due to the fact that we discard the

// first filter i=0

// look for the first DFT sample for this filter

if (j < m_nboundariesDFTBins[i - 1][0]) {

m_nboundariesDFTBins[i - 1][0] = j;

}

// look for the last DFT sample for this filter

if (j > m_nboundariesDFTBins[i - 1][1]) {

m_nboundariesDFTBins[i - 1][1] = j;

}

// check for consistency. The problem below would happen just

// in case of a big number of MFCC parameters for a small DFT length.

for (int i = 0; i < nnumberofFilters; i++) {

if (m_nboundariesDFTBins[i][0] == m_nboundariesDFTBins[i][1]) {

new Error(

"Error in MFCC filter bank. In filter "

+ i

+ " the first sample is equal to the last sample !"

+ " Try changing some parameters, for example, decreasing the number of filters.");

}

// allocate space

for (int i = 0; i < nnumberofFilters; i++) {

m_dweights[i] = new double[m_nboundariesDFTBins[i][1]

- m_nboundariesDFTBins[i][0] + 1];

}

// calculate the weights

for (int i = 1; i <= nnumberofFilters; i++) {

for (int j = m_nboundariesDFTBins[i - 1][0], k = 0; j <= m_nboundariesDFTBins[i - 1][1]; j++, k++) {

if (dfftFrequenciesInMel[j] < dfilterCenterFrequencies[i]) {

m_dweights[i - 1][k] = (dfftFrequenciesInMel[j] - dfilterCenterFrequencies[i - 1])

/ (dfilterCenterFrequencies[i] - dfilterCenterFrequencies[i - 1]);

} else {

m_dweights[i - 1][k] = 1.0 - ((dfftFrequenciesInMel[j] - dfilterCenterFrequencies[i]) / (dfilterCenterFrequencies[i + 1] - dfilterCenterFrequencies[i]));

}

public double[] getParameters(double[] fspeechFrame) {

// use mel filter bank

for (int i = 0; i < m_nnumberOfFilters; i++) {

m_dfilterOutput[i] = 0.0;

// Notice that the FFT samples at 0 (DC) and fs/2 are not considered

// on this calculation

if (m_ousePowerInsteadOfMagnitude) {

double[] fpowerSpectrum = m_fft.calculateFFTPower(fspeechFrame);

for (int j = m_nboundariesDFTBins[i][0], k = 0; j <= m_nboundariesDFTBins[i][1]; j++, k++) {

m_dfilterOutput[i] += fpowerSpectrum[j] * m_dweights[i][k];

}

} else {

double[] fmagnitudeSpectrum = m_fft

.calculateFFTMagnitude(fspeechFrame);

for (int j = m_nboundariesDFTBins[i][0], k = 0; j <= m_nboundariesDFTBins[i][1]; j++, k++) {

m_dfilterOutput[i] += fmagnitudeSpectrum[j]

* m_dweights[i][k];

}

// ISIP (Mississipi univ.) implementation

if (m_dfilterOutput[i] > m_dminimumFilterOutput) {// floor power to

// avoid log(0)

m_dfilterOutput[i] = Math.log(m_dfilterOutput[i]); // using ln

} else {

m_dfilterOutput[i] = m_dlogFilterOutputFloor;

}

double[] dMFCCParameters = null;

if (m_oisZeroThCepstralCoefficientCalculated) {

dMFCCParameters = new double[m_nnumberOfParameters + 1];

// calculates zero'th cepstral coefficient and pack it

// after the MFCC parameters of each frame for the sake

// of compatibility with HTK

double dzeroThCepstralCoefficient = 0.0;

for (int j = 0; j < m_nnumberOfFilters; j++) {

dzeroThCepstralCoefficient += m_dfilterOutput[j];

}

dzeroThCepstralCoefficient *= m_dscalingFactor;

dMFCCParameters[dMFCCParameters.length - 1] = dzeroThCepstralCoefficient;

} else {

// allocate space

dMFCCParameters = new double[m_nnumberOfParameters];

}

// cosine transform

for (int i = 0; i < m_nnumberOfParameters; i++) {

for (int j = 0; j < m_nnumberOfFilters; j++) {

dMFCCParameters[i] += m_dfilterOutput[j] * m_ddCTMatrix[i][j];

// the original equations have the first index as 1

}

// could potentially incorporate liftering factor and

// factor below to save multiplications, but will not

// do it for the sake of clarity

dMFCCParameters[i] *= m_dscalingFactor;

}

if (m_oisLifteringEnabled) {

for (int i = 0; i < m_nnumberOfParameters; i++) {

dMFCCParameters[i] *= m_nlifteringMultiplicationFactor[i];

}

return dMFCCParameters;

} // end method

/**

* Returns the sampling frequency.

public double getSamplingFrequency() {

return this.m_dsamplingFrequency;

}

/**

* Returns the number of points of the Fast Fourier Transform (FFT) used in

* the calculation of this MFCC.

public int getFFTLength() {

return m_nFFTLength;

}

/**

* Returns the number of MFCC coefficients, including the 0-th if required

* by user in the object construction.

public int getNumberOfCoefficients() {

return (m_oisZeroThCepstralCoefficientCalculated ? (m_nnumberOfParameters + 1)

: m_nnumberOfParameters);

}

/**

* Return a string with all important parameters of this object.

public String toString() {

return "MFCC.nnumberOfParameters = "

+ (m_oisZeroThCepstralCoefficientCalculated ? (m_nnumberOfParameters + 1)

: m_nnumberOfParameters) + "\n"

+ "MFCC.nnumberOfFilters = " + m_nnumberOfFilters + "\n"

+ "MFCC.nFFTLength = " + m_nFFTLength + "\n"

+ "MFCC.dsamplingFrequency = " + m_dsamplingFrequency + "\n"

+ "MFCC.nlifteringCoefficient = " + m_nlifteringCoefficient

+ "\n" + "MFCC.oisLifteringEnabled = " + m_oisLifteringEnabled

+ "\n" + "MFCC.oisZeroThCepstralCoefficientCalculated = "

+ m_oisZeroThCepstralCoefficientCalculated;

}

public double[] getFilterBankOutputs(double[] fspeechFrame) {

double dfilterOutput[] = new double[m_nnumberOfFilters];

for (int i = 0; i < m_nnumberOfFilters; i++) {

if (m_ousePowerInsteadOfMagnitude) {

double[] fpowerSpectrum = m_fft.calculateFFTPower(fspeechFrame);

for (int j = m_nboundariesDFTBins[i][0], k = 0; j <= m_nboundariesDFTBins[i][1]; j++, k++) {

dfilterOutput[i] += fpowerSpectrum[j] * m_dweights[i][k];

}

} else {

double[] fmagnitudeSpectrum = m_fft

.calculateFFTMagnitude(fspeechFrame);

for (int j = m_nboundariesDFTBins[i][0], k = 0; j <= m_nboundariesDFTBins[i][1]; j++, k++) {

dfilterOutput[i] += fmagnitudeSpectrum[j]

* m_dweights[i][k];

}

if (dfilterOutput[i] > m_dminimumFilterOutput) {// floor power to

// avoid log(0)

dfilterOutput[i] = Math.log(dfilterOutput[i]); // using ln

} else {

dfilterOutput[i] = m_dlogFilterOutputFloor;

}

return dfilterOutput;

}

Вторая часть тут:

http://pastebin.com/sR52wqYR

Не знаю, думал так правильнее будет с помощью ДатаЛайна...

Звуковых файлов особо и нет)

Вот пример файла, могу еще выложить многа многа файлов которые аюдасити получил из этого, когда делил на 0.25 секунд.

Спасибо вам!
Прикреплённый файл

mix21.WAVE (56,04 Кбайт, скачиваний: 412)

Добавлено 27.02.14, 12:12
У меня нет нормального микрофона, честно говоря )) так что подумывал где бы взять хотя бы пару словечек в WAVE формате...

nsh

Сообщ. #9 , 27.02.14, 13:34

Moderator

Профиль · PM

Размер окна 1024, а должен быть 320

double dsamplingFrequency в программе 8000.0, а файлы у Вас в 16khz

На окна звук нужно в программе резать, а не в audacity

dataline не нужен, его можно убрать

Программу лучше выкладывать в архиве, чтобы её запустить можно было, а не вставлять в два разных места сразу

Mel коэффициенты не надо к int приводить, они должны быть с плавающей точностью

Свой код лучше писать не внутри чужого, а в отдельных файлах. Тогда понятнее будет, где искать ошибку.

Попробуйте хотя бы какие-то замечания исправить сначала

MrKor

Сообщ. #10 , 27.02.14, 15:27

Junior

Профиль · PM

Рейтинг (т): нет

Цитата nsh @ 27.02.14, 13:34

Исправил размер окна, а почему 320?)

Пока программно звук резать не умею, вообще как?)

Может разделить прочитанные байты из него? на равные участки..

Про мелы пока не вижу что-то..

Спасибо вам, что помогаете!

MrKor	Сообщ. #11 , 27.02.14, 17:50
Junior Профиль · PM Рейтинг (т): нет	Или вы имели в виду вывод самого результата в double сделать?)

nsh

Сообщ. #12 , 28.02.14, 21:50

Moderator

Профиль · PM

Хорошо, как что-нибудь получите, выкладывайте обновлённый код и результаты.

По поводу нарезки на окна в коде, это можно делать так:

ArrayList<float[]> frames = new ArrayList<float[]>();

for (int i = 0; i < data.length; i = i + frame_shift) {

float[] frame = new float[frame_size];

System.arraycopy(data, i, frame, 0, frame_size);

frames.append(frame);

}

Сообщение отредактировано: nsh - 28.02.14, 21:51

MrKor	Сообщ. #13 , 01.03.14, 12:03
Junior Профиль · PM Рейтинг (т): нет	Уху, спасибки нш! А можно , пожалуйста, уточнить что где в коде, получается перевод в double не нужен уже что ли? А на сколько окон будет нарезка?) Как бы не было стыка величин байты с флотами и т д...

MrKor	Сообщ. #14 , 01.03.14, 13:08
Junior Профиль · PM Рейтинг (т): нет	Просто у вас там величины их свои брать или из формата добывать?: System.out.println("=====Info====="); System.out.println(format.getFrameSize()); double sampleSize = format.getSampleSizeInBits()/8; double FramesCount = data.length / (sampleSize*format.getChannels()); System.out.println(FramesCount);

nsh

Сообщ. #15 , 02.03.14, 08:07

Moderator

Профиль · PM

> там величины их свои брать или из формата добывать

frame size 410 отсчетов

frame shift 320 отсчетов

fft size 512

sample rate 16000

Добавлено 02.03.14, 08:54
А вообще лучше более разумную библиотеку взять, например CMUSphinx http://cmusphinx.sourceforge.net/wiki/sphinx4. Код для извлечения MFCC в sphinx4/edu/cmu/sphinx/tools/feature/FeatureFileDumper.java.

В этом спантусе классов много, а толку от них мало. Видимо, автор перечитал книгу о шаблонах проектирования и наделал кучу сервисов, фасадов и прочего.

1 пользователей читают эту тему (1 гостей и 0 скрытых пользователей)

0 пользователей:

Страницы: (3) [1] 2 3 все

[ Script execution time: 0,0552 ] [ 16 queries used ] [ Generated: 5.07.25, 23:51 GMT ]