본문 바로가기

Jeongwooyeol

Notice

Recent Posts

Popular Posts

Recent Comments

Link

Calendar

Tags

더보기

Archives

Visits

Today

Yesterday

연구실 공부

[논문] HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis https://arxiv.org/abs/2010.05646 HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis Several recent work on speech synthesis have employed generative adversarial networks (GANs) to produce raw waveforms. Although such methods improve the sampling efficiency and memory usage, their sample quality has not yet reached that of autoregressive a arxiv.org 해당 논문을 .. 연구실 공부 2024. 3. 8.

[논문] VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network https://arxiv.org/abs/2007.15256 VocGAN: A High-Fidelity Real-time Vocoder with a Hierarchically-nested Adversarial Network We present a novel high-fidelity real-time neural vocoder called VocGAN. A recently developed GAN-based vocoder, MelGAN, produces speech waveforms in real-time. However, it often produces a waveform that is insufficient in quality or inconsistent with acou arxiv.org 해당 논문을 .. 연구실 공부 2024. 3. 7.

[논문] MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis https://arxiv.org/abs/1910.06711 MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Previous works (Donahue et al., 2018a; Engel et al., 2019a) have found that generating coherent raw audio waveforms with GANs is challenging. In this paper, we show that it is possible to train GANs reliably to generate high quality coherent waveforms by i arxiv.org 해당 논문을 보고 작성했습니다. Abstr.. 연구실 공부 2024. 3. 6.

[논문] WaveGlow: A Flow-Based Generative Network for Speech Synthesis https://arxiv.org/abs/1811.00002 WaveGlow: A Flow-based Generative Network for Speech Synthesis In this paper we propose WaveGlow: a flow-based network capable of generating high quality speech from mel-spectrograms. WaveGlow combines insights from Glow and WaveNet in order to provide fast, efficient and high-quality audio synthesis, without the need arxiv.org 해당 논문을 보고 작성했습니다. Abstract 이 논문에서 W.. 연구실 공부 2024. 3. 3.

[논문] WaveNet: A Generative Model for Raw Audio https://arxiv.org/abs/1609.03499 WaveNet: A Generative Model for Raw Audio This paper introduces WaveNet, a deep neural network for generating raw audio waveforms. The model is fully probabilistic and autoregressive, with the predictive distribution for each audio sample conditioned on all previous ones; nonetheless we show that arxiv.org 해당 논문을 보고 작성했습니다. Abstract 저자들은 raw audio waveform을 생성해 내.. 연구실 공부 2024. 2. 29.

Source Filtering source filtering model이란, 사람이 말을 하는 과정을 그대로 수학적으로 modeling하는 방식을 의미합니다. 사람의 음성은 폐에서 시작된 신호가 발생기관(성대)을 통과하면서 주기적 신호로 만들어지고, vocal tract(성도)를 거치며 조음을 통해 만들어 지게 됩니다. 그림으로 표현하면 위와 같습니다. 폐로부터 만들어진 공기가 성대를 통과하여 주기적인 신호를 갖는 공기의 흐름으로 바뀌는데, 이때 일정한 주파수를 갖고 떨린다면 이를 fundamental frequency(기본 주파수)라고 부릅니다. 또한 성대에서 만들어진 signal이나 noise와 같은 신호를 excitation이라고 부르게 됩니다. Vocal fold(성대)는 진동을 하며 harmonics(배음)와 noise를 만.. 연구실 공부 2024. 2. 27.

음성학 관련 공부 소리(sound)란, 물체가 진동해서 생긴 음파이며 귀로 들을 수 있는 것을 의미합니다. 잡음(noise)이란, 사람이 들으려는 소리 이외의 모든 소리를 의미하며, 자연계에서 항상 존재하는 일정 level의 무의미한 신호입니다. 음성(voice)이란, 사람의 발음 기관을 통해 내는 구체적이고 물리적인 소리이며, 인간이 의미를 음향학적 신호로 전달하는 소리입니다. 말소리(speech)란, 사람의 발음기관을 통해 내는 구체적이고 물리적인 소리이며, 인간이 언어를 사용하여 의미를 전달하는 소리입니다. 음운(phoneme)이란, 사람이 음향적 신호를 받아들여 말의 내용을 이해한 소리를 의미합니다. 음성학이란, 언어에서 쓰이는 소리들의 물리적 속성을 연구하는 학문입니다. 소리를 시간적이고 공간적인 연속체이며, 이.. 연구실 공부 2024. 2. 27.

spectrum, spectrogram, Mel-spectrogram, MFCC 음성에 들어있는 정보(발음의 종류, 성별, 음색, 높이 등)는 음성 신호 자체에서 쉽사리 얻어낼 수 없고, 수학적인 신호 처리를 거쳐야만 추출할 수 있습니다. 그중 대표적인 한 가지로, 음성을 주파수(frequency, Hz)라는 또 다른 축으로 관측하는 방법이 있습니다. frequency란, 신호가 1초에 몇 번 진동했는지를 나타내는 수치이며, 소리는 빠르게 진동할수록, 즉 주파수가 높을수록 음이 높게 들립니다. 주파수가 낮다면 저음이 들리게 됩니다. 자연에서 들을 수 있는 모든 소리는 다양한 주파수 성분들의 합으로 이루어져 있습니다. 그래서 Fourier transform을 이용해 소리를 다양한 주파수 성분들로 분해합니다. Fourier transform이라는 함수를 사용하면 특정 시간 길이의 음성 .. 연구실 공부 2024. 2. 27.

이전 1 ··· 11 12 13 14 15 16 17 ··· 21 다음

728x90

티스토리툴바