Decode MP3 from Memory #815

jjedele · 2020-02-27T23:23:01Z

No description provided.

Port operator definition and kernel.

Add operator to API.

Add tests.

Make the linters happy.

jjedele · 2020-02-27T23:27:50Z

Linter check fails, but it does not seem to be required to my changes:

curl -sSOL https://github.com/bazelbuild/bazel/releases/download/2.0.0/bazel-2.0.0-installer-linux-x86_64.sh

sudo bash -e bazel-2.0.0-installer-linux-x86_64.sh
bazel-2.0.0-installer-linux-x86_64.sh: line 1: syntax error near unexpected token `<'
##[error]Process completed with exit code 2.

jjedele · 2020-02-27T23:28:57Z

Open questions:

Is the API placed well right now or should this go in experimental?
Add test for stereo file

jjedele · 2020-02-27T23:31:50Z

@yongtang Got to open 1st version of the PR. Please have a look and tell me your thoughts when you have some time :)

yongtang · 2020-02-28T01:12:18Z

@jjedele Thanks for the PR, overall looks good!

Some thought about API:

If we can get the API right and allow future expansions then I think placing it under tfio.audio is fine.
While tf.audio.decode_wav has the argument of desired_channels and desired_samples, I am wondering if they are truly needed? If I understand correctly, desired_channels and desired_samples was in tensorflow's core repo, because in graph mode the samples and channels are needed to get the right shape [samples, channels] in DecodeWavShapeFn.
However, the shape is not truly needed in eager mode with TF 2.0 (shape function is not called in eager mode). And, even in graph mode, it is ok to provide a shape with only unknown dimension (e.g., [None, None]).
My concern with desired_channels and desired_samples is that, if they conflict with the intrinsic channels and samples, it adds additional error processing that actually makes thing complicated. For example, if user insists on desired_channels=2 while the audio is truly channels=1, we probably don't have a good way to resolve the conflict?
Another thing to consider, is that from API point of view, do we want to have a list of tfio.audio.decode_mp3, tfio.audio.decode_mp4a, tfio.audio.decode_flac, etc., or maybe we could just have a tfio.audio.decode() that will decode any audio clips into a tensor of [samples, channels] shape? Check if the file is mp3, mp4, flac, ogg is actually possible. Most of the time we only need to check the first few magic bytes. Will that be something we want to consider?

jjedele · 2020-02-28T10:00:54Z

@yongtang : Thanks for your input!

My thoughts:

`desired_channels`, `desired_samples`

For decode_wav I found this pretty useful because I was working with a non-RNN audio model which would take audio snippets of approx. 1s length, convert them to spectrograms followed by CNN-based classification. For the CNN (and also batching) it was nice that the API allowed me to get all the data samples with deterministic shape.

The situations you describe I currently handle the same way decode_wav does. So e.g. if desired_channels==2 and actual_channels==1, the mono channel from the source will be duplicated into both stereo channels of the outputs. The other cases are pretty standard cropping and padding.

However you got me thinking now if this really should be a responsibility of the decoding functions or rather of some other part of the API. Instead of desired_samples, one could use tensor slicing or padded_batch I think. For desired_channels I have to think a bit more, I haven't worked with stereo models yet.

decode() vs decode_format()

Good point. The situation is similar to the image decoding functions, for which they ended up offering both. My first experiences with the general decode_image have been a bit frustrating because it would produce a different shape for GIFs than for other image formats (I didn't know about the expand_animations flag). I currently can't think of similar issues for audio data though.

This is something we would need to implement on operator-level (in contrast to adding it to the Python API) because we need access to the actual data to make the decision, right?

--

I also added 2 questions about technical details into the code above (error reporting in TF and operator naming). For those I would also appreciate your input.

jjedele · 2020-02-28T10:04:02Z

The failing CI checks are nothing I'm causing, right? Looks like generic git problems.

yongtang · 2020-02-28T16:20:33Z

@jjedele Yes the failing CI is not a concern (likely GitHub actions checkout is not able to find the base commit), once you update it will work I think.

yongtang · 2020-02-28T16:29:57Z

@jjedele Thanks!

decode() vs decode_format()

I think we can certainly offer both. A generic decode() is very useful in building a data pipeline where the input could be a mixture of audio clips with different format and different channels.

A decode_format() could be useful as well, as certain format types might need additional information. For example, mp3 seems to always default to float, though in mp4a user may occasionally want to get the raw non-decoded sample frames (e.g., with ADTS header).

I think both will help user in different use cases, and the underlying implementation could be consolidated so that code are re-used.

desired_channels, desired_samples

One thing we tend to favor, is to push the optional args into python level whenever possible, and makes C++ level ops stable. The reason is that making changes to C++ could be hard for some contributors in this day and age. With more code in python level it will be much easier to get more contributors involved in the project.

For example, even if the basic C++ level ops may only expose [None, None] shape, it is quite easy to add a wrapper to get the shape right in python, e.g, through tf.reshape.

Also, for duplicate the mono channel to stereo channels, at the python level tf.broadcast_to could be easily added to achieve the same goal. Compared to C++ changes, those python code is much easy to debug and maintain.

jjedele · 2020-02-28T19:50:59Z

@yongtang : Yeah, what you're saying makes total sense. So let me summarize the plan:

We simplify the current operator to not do any modifications to the shape of the data.
We lift the shaping functionality to Python level, i.e. tensorflow_io.core.python.ops.audio_ops
We introduce a new DecodeAudio operator which looks at the header of the data and dispatches the decoding to the appropriate decoding operator.

Sounds good?

yongtang · 2020-02-28T21:56:39Z

@jjedele The summary looks good! Let me know if you need any help in bazel build/etc 👍

Lift functionality to fix shapes from C++ to Python level.

jjedele · 2020-02-29T11:57:02Z

Implemented and pushed the part with lifting the shaping functionality to Python level. Going to look at the generic decode() operator next. I will have a look at the decode_image() source code and try to stay close to that.

jjedele · 2020-02-29T14:14:26Z

After initial investigation it unfortunately seems like media files are not always easily identifiable by the magic number in the header, e.g. https://stackoverflow.com/questions/11360286/detect-if-a-file-is-an-mp3-file

Hope the situation is better for Ogg, Flac and MP4a

yongtang · 2020-02-29T17:15:40Z

@jjedele We could start with having DecodeOp support mp3 initially, and gradually expand to other types in later or follow up PRs.

In case of detecting mp3 file, we could also try detecting other files and fall back to mp3 as the "last" type. If this is still not a mp3 file, then minimp3 will return an error anyway.

Below are the place of AudioIOTensor' of checking ogg/flac/wav and fall back to mp3:

io/tensorflow_io/core/kernels/audio_kernels.cc

Lines 32 to 46 in 7a19d34

    
           TF_RETURN_IF_ERROR(env_->NewRandomAccessFile(input, &file)); 
        
           char header[8]; 
        
           StringPiece result; 
        
           TF_RETURN_IF_ERROR(file->Read(0, sizeof(header), &result, header)); 
        
           if (memcmp(header, "RIFF", 4) == 0) { 
        
             return WAVReadableResourceInit(env_, input, resource_); 
        
           } else if (memcmp(header, "OggS", 4) == 0) { 
        
             return OggReadableResourceInit(env_, input, resource_); 
        
           } else if (memcmp(header, "fLaC", 4) == 0) { 
        
             return FlacReadableResourceInit(env_, input, resource_); 
        
           } 
        
           Status status = MP3ReadableResourceInit(env_, input, resource_); 
        
           if (status.ok()) { 
        
             return status; 
        
           }

jjedele · 2020-02-29T17:40:36Z

@yongtang Thx for hinting me at that code.

For MP4 it also seems to work with the header (https://www.file-recovery.com/mp4-signature-format.htm).

I'm not a big fan of things happening implicit, but in this case it's probably the best solution. Also, as long the MP3 files have ID3 metadata (which I guess usually is the case), that can be identified as well.

yongtang · 2020-03-01T17:51:16Z

@jjedele Another option is to add an Attr of format in Decode to forcefully stay with one format. For example, if users just want to decode a list of files with different format, then they could use:

audio = decode(input) # format = None

if they know exactly the format then they could also use:

audio = decode(input, format="mp3")

For many users, honestly they don't care about the format itself, they only want to have an API to decode an audio file (in different format) and not worry about all kinds of parameters. (We could see a similar case of decode_image which is used heavily).

For some other users, they do want fine control of the format. So ideally I think an option of auto probing the format makes sense.

jjedele · 2020-03-02T10:24:20Z

@yongtang Unfortunately didn't get to continue on this yet since I'm a bit busy with other things right now.

I think I would stay with the approach with specific decode_format methods and a general decode method that infers the format automatically. Reason: If you already know the format, calling the right method instead of setting a parameter is not really more difficult. And if we consider having specific decoding options for different formats, we can clearly separate those into the individual decode methods. If you have single one with format as parameter, we would need to throw all these things together in one API and add lots of documentation as in "this parameter will only be considered if format == mp3", etc. Not a fan of the second solution. Also it is consistent with how image decoding in core TF is implemented currently.

Factor out audio format detection.

jjedele · 2020-03-02T12:56:51Z

@yongtang I'm also thinking right now about what's the best approach to share the code between decode and decode_mp3. I've seen (https://github.com/tensorflow/io/blob/master/tensorflow_io/core/kernels/audio_kernels.cc#L83) that it is possible to call other kernels' compute methods from a kernel. Does making the decode operator a simple wrapper that looks at the header and then dispatches to the appropriate operator's compute method sound like a good idea to you?

yongtang · 2020-03-02T16:23:10Z

@jjedele Yes we want to reduce code duplication as much as possible, so reusing code in different parts would be great 👍

jjedele · 2020-03-02T17:06:37Z

@yongtang This is not so much about reusing code yes or no, but rather about the best way to do it. I currently see 2 options:

Leave the DecodeMp3Operator as it is currently and implement a DecodeOperator which directly calls DecodeMp3Operator::Compute.
Extract the decoding logic into a new function with a signature similar to void* DecodeMp3(void *data) which would then be called from both operators.

Option 1 seems preferable to me since it's simpler. For 2 I would still need to duplicate the shape logic. I don't know if option 1 makes any problems if we pass along the TF Context objects, etc though.

yongtang · 2020-03-02T19:33:43Z

@jjedele I would suggest go with 2, as honestly I don't know if there will any implications if we go with 1 (with graph node, context, etc).

jjedele · 2020-03-02T19:39:15Z

@yongtang Ok, I will do that! Thanks again for your help.

lieff · 2020-03-03T10:42:11Z

After initial investigation it unfortunately seems like media files are not always easily identifiable by the magic number in the header, e.g. https://stackoverflow.com/questions/11360286/detect-if-a-file-is-an-mp3-file

I can write helper function for mp3 detect. Basically we need to check several consecutive frames to prove this is really mp3 if id3v2 is absent + support damaged files like https://github.com/lieff/minimp3/blob/master/vectors/l3-sin1k0db.bit .

jjedele · 2020-03-03T14:58:32Z

@lieff Thx for offering! For my use case where we already have the whole data in memory this would be awesome.

I'm a bit unsure yet how we would integrate it for the file-based reader. We would probably need to read a considerable bigger piece than just the file header for testing. But might be worth it as long it's in kb range.

yongtang · 2020-03-03T15:33:00Z

Thanks @lieff for offering help!

@jjedele File based access is possible, as TensorFlow's FileSystem is truly a set of callback functions with random offset Read and GetFileSize. That should be enough for any processing. On a side note this is also how TensorFlow handles different scheme storage like s3 or gcs where they are not truly traditional local files.

Minimp3 already have a callback API (again thanks @lieff for the great work 👍 ) so it is quite easy to wire up the two callbacks.

The following is how callback is wrapped to use mp3 for processing IOTensor and Dataset:

io/tensorflow_io/core/kernels/audio_mp3_kernels.cc

Lines 25 to 51 in 7a19d34

    
           class MP3Stream { 
        
            public: 
        
             MP3Stream(SizedRandomAccessFile* file, int64 size) 
        
                 : file(file), size(size), offset(0) {} 
        
             ~MP3Stream() {} 
        
             static size_t ReadCallback(void* buf, size_t size, void* user_data) { 
        
               MP3Stream* p = static_cast<MP3Stream*>(user_data); 
        
               StringPiece result; 
        
               Status status = p->file->Read(p->offset, size, &result, (char*)buf); 
        
               p->offset += result.size(); 
        
               return result.size(); 
        
             } 
        
             static int SeekCallback(uint64_t position, void* user_data) { 
        
               MP3Stream* p = static_cast<MP3Stream*>(user_data); 
        
               if (position < 0 || position > p->size) { 
        
                 return -1; 
        
               } 
        
               p->offset = position; 
        
               return 0; 
        
             } 
        
             SizedRandomAccessFile* file = nullptr; 
        
             int64 size = 0; 
        
             long offset = 0; 
        
           };

In fact even in case the whole file is in memory, the callback could be applied as well by just translating callback to memory direct access.

lieff · 2020-03-03T16:37:41Z

We would probably need to read a considerable bigger piece than just the file header for testing.

Yes, ~16kb worst case is needed for 10 consecutive frames. I'll create detect functions and note here when they're ready.

lieff · 2020-03-04T13:50:57Z

Here new detect functions lieff/minimp3@0a2ff3b .
Returns zero if detect succeed and MP3D_E_USER if failed, or MP3D_E_IOERROR on IO error.

Refactor to work towards general DecodeAudio operator.

jjedele · 2020-03-04T20:41:33Z

Thx for doing this so quickly @lieff !

jjedele · 2020-03-04T20:45:09Z

New ToDo list:

Add @lieff 's MP3 identification function. Probably we'll need to update the dependency in the build file?
Create a DecodeAudioBaseOp so we do not have to duplicate the output tensor creation logic.
Implement generic decode() operator.

yongtang · 2020-03-04T21:52:19Z

Thanks @lieff for the help!

@jjedele you can update the workspace file in

io/WORKSPACE

Lines 724 to 732 in 7a19d34

    
           http_archive( 
        
               name = "minimp3", 
        
               build_file = "//third_party:minimp3.BUILD", 
        
               sha256 = "53dd89dbf235c3a282b61fec07eb29730deb1a828b0c9ec95b17b9bd4b22cc3d", 
        
               strip_prefix = "minimp3-2b9a0237547ca5f6f98e28a850237cc68f560f7a", 
        
               urls = [ 
        
                   "https://github.com/lieff/minimp3/archive/2b9a0237547ca5f6f98e28a850237cc68f560f7a.tar.gz", 
        
               ], 
        
           )

The strip_prefix and urls fields should be replaced with new git commit, the sha256 is the new sha256 of the .tar.gz file.

Implement general DecodeAudio operator.

Add lieff's detect_mp3 function.

jjedele

@yongtang Implemented the general audio.decode operator now and wired it together with the MP3 decoding. The code is in a state which seems OK to me, but I'm not an experienced C++ programmer, so happy about any feedback. I would suggest that we finish up this PR and then implement decoding for the other formats in follow up PRs.

jjedele · 2020-03-05T15:36:25Z

tensorflow_io/core/kernels/audio_kernels.cc

+// DecodeAudioBaseOp
+DecodeAudioBaseOp::DecodeAudioBaseOp(OpKernelConstruction *context) : OpKernel(context) {}
+
+void DecodeAudioBaseOp::Compute(OpKernelContext *context) {


I'm a bit unhappy with the fact that these methods are in tensorflow::data while everything else is in this nested nameless namespace. I'm not experienced enough with C++ to know what this is about, so happy about feedback/ideas.

yongtang · 2020-03-05T20:33:14Z

tensorflow_io/core/kernels/audio_kernels.cc

+
+// DecodedAudio
+size_t DecodedAudio::data_size() {
+  return channels * samples_perchannel * sizeof(int16);


I think google's code style prefer method name with CamelCase, so DataSize instead?

Also, not all audio files stay with int16 so this one has to at least take into consideration the data types. But that is a larger discussion.

yongtang · 2020-03-05T20:34:07Z

tensorflow_io/core/kernels/audio_kernels.cc

@@ -30,23 +108,27 @@ class AudioReadableResource : public AudioReadableResourceBase {
    mutex_lock l(mu_);
    std::unique_ptr<tensorflow::RandomAccessFile> file;
    TF_RETURN_IF_ERROR(env_->NewRandomAccessFile(input, &file));
-    char header[8];
+    char header_buf[8];


I think header is fine here?

yongtang · 2020-03-05T20:39:20Z

tensorflow_io/core/kernels/audio_kernels.cc

+ public:
+  DecodeAudioOp(OpKernelConstruction *context) : DecodeAudioBaseOp(context) {}
+
+  std::unique_ptr<DecodedAudio> decode(StringPiece &data, void *config) {


Google's style is to return Status in return field of the function, and for any values/pointers that needs to be returned as well, they will be placed at the end of the function with *, so something like:

Status Decode(StringPiece &data, DecodeAudio** audio);

Then you could use:

DecodeAudio* audio; Status status = Decode(data, &audio); std::unique_ptr<DecodedAudio> d; d.reset(audio);

You might also pass unique_ptr as well I think:

Status Decode(StringPiece &data, std::unique_ptr<DecodeAudio>* audio);

Passing a pointer to a unique pointer sounds pretty hacky, do you think that's a good idea? Probably I'd rather go with the first option. Thx for pointing me to the coding style!

yongtang · 2020-03-05T20:39:53Z

tensorflow_io/core/kernels/audio_kernels.cc

+  DecodeAudioOp(OpKernelConstruction *context) : DecodeAudioBaseOp(context) {}
+
+  std::unique_ptr<DecodedAudio> decode(StringPiece &data, void *config) {
+    auto error = std::unique_ptr<DecodedAudio>(new DecodedAudio(false, 0, 0, 0, nullptr));


I think this does no capture the error situation.

yongtang · 2020-03-05T20:46:39Z

tensorflow_io/core/kernels/audio_kernels.h

@@ -45,5 +54,40 @@ Status MP4ReadableResourceInit(
    Env* env, const string& input,
    std::unique_ptr<AudioReadableResourceBase>& resource);

+// Container for decoded audio.
+class DecodedAudio {
+public:


Not sure we need a class here, as it is just a struct with one function that does a

channels * samples_perchannel * sizeof(int16);

maybe we don't need this class after all?

Yes, I think you're right. I think a struct is enough.

yongtang · 2020-03-05T20:49:08Z

tensorflow_io/core/kernels/audio_kernels.h

+  const int sampling_rate;
+  // should first contain all samples of the left channel
+  // followed by the right channel
+  const int16 *data;


I think we could avoid allocate the memory here, as we can create the output Tensor which will hold the memory. Then the output Tensor can be used directly to get the data?

I was thinking about this for a while, but I'm not sure how it would work. The problem is I need a shape to allocate the output tensor, which I get by decoding the MP3, for which in turn I need to have memory.

In case of a shape, in:

io/tensorflow_io/core/kernels/audio_mp3_kernels.cc

Line 103 in 7a19d34

Status Read(const int64 start, const int64 stop,

a callback-type of lambda is passed which allows the allocation to be done when shape is ready. This woucl be helpful when we only want to call once to read.

yongtang · 2020-03-05T20:51:07Z

tensorflow_io/core/kernels/audio_kernels.cc

+ public:
+  DecodeAudioOp(OpKernelConstruction *context) : DecodeAudioBaseOp(context) {}
+
+  std::unique_ptr<DecodedAudio> decode(StringPiece &data, void *config) {


Also, I don't see the decode here? as it only does a classify?

I does call DecodeMp3 here: https://github.com/tensorflow/io/pull/815/files/1c9995fc63e535f5c6056caf044e6da037f19414#diff-535af6f0310d54f8fb4282b7629c027bR369

yongtang · 2020-03-05T20:57:54Z

@jjedele If you take a look at the exiting MP3ReadableResource, you probably noticed that if you replace:

file_.reset(new SizedRandomAccessFile(env_, filename, nullptr, 0));

With the buffer's data and size as :

file_.reset(new SizedRandomAccessFile(env_, filename, buffer, length));

You pretty much have the process of memory backed mp3 decoder in place.

After that, you can check the intrinsic shape ([samples, channels]) and dtype (int16/float32/etc) through:

 Status Spec(TensorShape* shape, DataType* dtype, int32* rate)

Once you have the shape and dtype, you can just read the whole thing into output Tensor and decode_mp3 op is pretty much complete. Have you considered this as an approach as well?

jjedele · 2020-03-05T21:45:46Z

@yongtang Thx for your feedback! The approach you mention in your last comment I actually haven't considered. I did just assume that SizedRandomAccessFile would need to be backed by a real file. Now that you mentioned it I looked at the implementation and see that what you say would likely work.

I'm not sure how much I like the approach though. If we start doing this it would mean we must from now on always ensure that it works without a real file in the background. And since the actual decoding logic is super easy to use in lieff's library already, I don't think we would actually save that much code by doing this.

yongtang · 2020-03-05T22:30:28Z

@jjedele The mp3 (and to an extent mp4a) is not very challenging to decode, thanks to @lieff great libraries. Though we have different use cases for audio:

Sequential Access (AudioIODataset, to be passed to tf.keras, and lazily loaded)
Random Access (AudioIOTensor, allows __getitem__, and lazily loaded)
Basic ops that decode memory (non-lazily loaded as content is already in memory before hand.

We would like to come up to some way to not duplicate code in many places. Opened issue #839 for further discussion.

jjedele · 2020-03-08T12:35:14Z

@yongtang Should I continue adding the discussed changes here currently or do you think #839 will lead to a bigger redesign of the API which we should do upfront?

yongtang · 2020-03-08T18:08:49Z

@jjedele We prefer multiple smaller PRs than one big PR. For this PR I think we can stay with the focus of decode and decode_mp3, just need to sync up with the overall picture that is being discussed #839

Also, other than mp3 there are also several other types (wav, Flac, ogg, mp4) that has been more or less ready to be exposed as decode_format. We can add them in follow up PRs and if you don't mind, I can help work on some of them.

yongtang · 2020-03-08T18:15:46Z

@jjedele Also, given the ongoing discussion in #839, maybe we can temporarily place the API in tfio.experimental.audio? Once we get the majority complete we could batch move to tfio.audio.

We plan on having the next tensorflow-io release for TF 2.2. Since TF 2.2 is likely to be released after 4+ weeks (RC is not out yet), we pretty much have 1+ months to get everything in place and move from tfio.experimental.audio to tfio.audio by then.

jjedele · 2020-03-08T22:06:58Z

@yongtang Sounds good. I will move it to experimental.

jjedele · 2020-03-18T22:36:24Z

@yontang I was looking at your code a bit, and right now it seems to me like we would introduce much duplication for this whole shaping code if we create separate operators for the different decode operations without winning anything at this point. Maybe we should just go with the generic one to start with and then differentiate further if we actually find/need codec specific parameters.

yongtang · 2020-03-18T23:44:59Z

@jjedele I think what we could do is:

Leave python interface alone.
In C++ kernel, only have one AudioDecodeOp kernel, which, takes one additional Attr of encoding to optionally pass an encoding (mp3, mp4a, etc) so that it could be rerouted to the processing code.
In Python implementation, each decode_format will calls C++ binding io_audio_decode(..., encoding="mp3")

jjedele · 2020-03-19T18:56:22Z

@yongtang That sounds like a good idea. But maybe that should be another task since I've just seen that in the meantime we already have decode_flac, decode_ogg, etc.

On that note: At this point, it seems to me like decode_mp3 would mostly be copy/paste of one of the other operators since we can now reuse the ReadableResource implementations. Given that I currently don't have much time and am blocked until I can resolve #855 , maybe the most efficient thing to do would be closing this PR and you add the DecodeMP3Operator from a clean branch like the others. Then we can open a new issue for the general decode operator and me or somebody else can work at it at some point later.

What do you think?

yongtang · 2020-03-19T20:47:44Z

@jjedele Sure, let me take a look.

jjedele added 4 commits February 27, 2020 19:03

DecodeMp3 Operator

83158c6

Port operator definition and kernel.

DecodeMp3 Operator

055c139

Add operator to API.

DecodeMp3 Operator

e539365

Add tests.

DecodeMp3 Operator

ded863b

Make the linters happy.

DecodeMp3 Operator

c29c828

Lift functionality to fix shapes from C++ to Python level.

jjedele added 2 commits March 2, 2020 13:26

Merge remote-tracking branch 'core/master' into jj-decode-mp3-op

b9a7ca2

DecodeMp3 Operator

8d015d1

Factor out audio format detection.

DecodeMp3 Operator

f0fcd18

Refactor to work towards general DecodeAudio operator.

jjedele added 2 commits March 5, 2020 14:42

DecodeMp3 Operator

6799c01

Implement general DecodeAudio operator.

DecodeMp3 Operator

1c9995f

Add lieff's detect_mp3 function.

jjedele commented Mar 5, 2020

View reviewed changes

yongtang requested changes Mar 5, 2020

View reviewed changes

yongtang mentioned this pull request Mar 5, 2020

Audio Processing API and tfio.audio #839

Open

yongtang mentioned this pull request Mar 17, 2020

Change to float32 for AudioIOTensor and AudioIODataset in case of MP3 and Ogg #859

Merged

Merge branch 'master' into jj-decode-mp3-op

cba8bf7

yongtang mentioned this pull request Mar 20, 2020

Add tfio.experimental.audio.decode_mp3 support #865

Merged

terrytangyuan closed this in #865 Mar 21, 2020

Decode MP3 from Memory #815

Decode MP3 from Memory #815

Conversation

jjedele commented Feb 27, 2020

jjedele commented Feb 27, 2020

jjedele commented Feb 27, 2020 • edited Loading

jjedele commented Feb 27, 2020

yongtang commented Feb 28, 2020

jjedele commented Feb 28, 2020 • edited Loading

desired_channels, desired_samples

decode() vs decode_format()

jjedele commented Feb 28, 2020

yongtang commented Feb 28, 2020

yongtang commented Feb 28, 2020

jjedele commented Feb 28, 2020

yongtang commented Feb 28, 2020

jjedele commented Feb 29, 2020 • edited Loading

jjedele commented Feb 29, 2020

yongtang commented Feb 29, 2020

jjedele commented Feb 29, 2020

yongtang commented Mar 1, 2020

jjedele commented Mar 2, 2020

jjedele commented Mar 2, 2020

yongtang commented Mar 2, 2020

jjedele commented Mar 2, 2020

yongtang commented Mar 2, 2020

jjedele commented Mar 2, 2020

lieff commented Mar 3, 2020 • edited Loading

jjedele commented Mar 3, 2020

yongtang commented Mar 3, 2020

lieff commented Mar 3, 2020

lieff commented Mar 4, 2020

jjedele commented Mar 4, 2020

jjedele commented Mar 4, 2020 • edited Loading

yongtang commented Mar 4, 2020

jjedele left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yongtang commented Mar 5, 2020

jjedele commented Mar 5, 2020

yongtang commented Mar 5, 2020

jjedele commented Mar 8, 2020

yongtang commented Mar 8, 2020

yongtang commented Mar 8, 2020

jjedele commented Mar 8, 2020

jjedele commented Mar 18, 2020

yongtang commented Mar 18, 2020

jjedele commented Mar 19, 2020

yongtang commented Mar 19, 2020

jjedele commented Feb 27, 2020 •

edited

Loading

jjedele commented Feb 28, 2020 •

edited

Loading

`desired_channels`, `desired_samples`

jjedele commented Feb 29, 2020 •

edited

Loading

lieff commented Mar 3, 2020 •

edited

Loading

jjedele commented Mar 4, 2020 •

edited

Loading