Split big files #75

thesunlover · 2015-03-23T06:17:37Z

added support for huge files
check the long_test.py and play with minutes & processes number to fit in the RAM available

Edit:
Procedure of the process:

Splits the large audio file to fingerprintable smaller pieces
Fingerprints them one by one.
Saves them as a single audio in the DataBase.

warn: config is very weak n slow need to add tuning

worldveil · 2015-03-28T17:27:55Z

So to be clear, this splits up large files over a certain length (ie, 3 minutes), and fingerprints each as a separate "song", yes?

thesunlover · 2015-03-28T22:21:44Z

updated the description of the PR

worldveil · 2015-03-28T23:24:22Z

dejavu/__init__.py

@@ -16,6 +32,9 @@ class Dejavu(object):
    OFFSET = 'offset'
    OFFSET_SECS = 'offset_seconds'

+    SPLIT_DIR = "split_dir"
+    OVERWRITE_WHEN_SPLITING = 1


overwrites what?

its about overwriting the temp files that were split with ffmpeg.
probably wouldnt be needed if we delete the temporary directory.

probably should have coded it as "always overwrite" with no conditions

renamed the constant to OVERWRITE_TEMP_FILES_WHEN_SPLITING

worldveil · 2015-03-28T23:30:37Z

Also the binary files (large mp3s) are too large and I'd rather not increase the size of the repo with those.

Can you instead give a public link (could even download with urllib2 in example.py) to some copyright-free music?

thesunlover · 2015-03-30T05:42:38Z

ok, will do it later in the evening
Edit:
got rid of the mp3 files and created a modification in the test-file so it generates a new file with the content of the existing mp3 files.
modified the test so that it uses the generated file (58minutes file is generated)

thesunlover · 2015-03-30T21:46:46Z

removed the non-copyright files and added auto-creation of the long file with "ffmpeg concat"
added treeremove of the completed long file
please review once again

thesunlover · 2015-04-02T06:56:05Z

do we need to test with other formats like ogg & flv?

worldveil · 2015-04-02T14:10:02Z

If pydub handles them, I think it should be fine.

Will review this sometime this weekend I hope.

OVERWRITE_TEMP_FILES_WHEN_SPLITING

song_name_for_the_split

thesunlover · 2015-04-03T11:09:17Z

I have not thought about using 1 minute limit and splitting the existing files.

If I set the maximum audio length for straight fingerprinting to be 1 minute and use all the available CPU cores it might be faster to fingerprint even short songs. - full CPU usage and less amount of memory usage at the same time.

What do you say, should i test that?

slice limit is now a property in the Dejavu class

thesunlover · 2015-04-03T14:54:55Z

in the last 2 commits i moved a few arguments to be properties of Dejavu.
Should I move them in the config file?

worldveil · 2015-04-13T22:42:18Z

will review this week - apologies for the delay!

worldveil · 2015-04-19T19:59:14Z

OK, I'm pulling this for review right now.

First things first, we need to remove the large binary files from the git history. You removed them from current version, but they are still in history (.git folder is 89MB).

Second, yes the arguments for fingerprinting and splitting large files should be in the config. And we should document those options in README.md as well.

Many thanks for your patience in getting back to you!

thesunlover · 2015-04-20T11:09:57Z

I think it would be easier for me to
start a new branch,
copy the changes in there
close this pull request
and reopen it with the new branch

is that ok?

worldveil · 2015-04-20T15:22:52Z

sounds fine to me.

worldveil · 2015-08-03T02:40:41Z

were you able to create a new PR with this? would gladly merge.

thesunlover · 2015-08-03T13:53:05Z

Hello, worldveil
I hope I have enough time in the evening to do it at home

thesunlover · 2015-08-05T17:26:30Z

reposted the PR in here:
#87

thesunlover · 2017-03-28T14:52:46Z

@worldveil would you please review the new PR #87 and give advise how to properly calculate the offset_seconds for the parts that follow the first one ?

NathanielCustom · 2019-03-31T18:28:34Z

@thesunlover I left a comment in PR #87 on how I got offset_seconds working.

thesunlover added 3 commits March 23, 2015 00:50

windows compatible

c186602

split long files

a577691

warn: config is very weak n slow need to add tuning

use dir fingerprinting for splitted files

7b594c4

thesunlover mentioned this pull request Mar 23, 2015

memory usage #18

Open

worldveil reviewed Mar 28, 2015
View reviewed changes

thesunlover added 6 commits March 30, 2015 23:05

add concatenation list for ffmpeg concat

ef4e2c3

delete files in the output_split_path after FP

4d3c47b

delete non copyright files

5623c67

create the concatenation test

e4d38ea

better name for the splitting-test file

22e9fa3

correction of English grammar

44569e1

thesunlover added 3 commits April 3, 2015 13:44

better name of constant

f3d41d1

OVERWRITE_TEMP_FILES_WHEN_SPLITING

spelling

2ffddca

clarify argument name

a0394d7

song_name_for_the_split

thesunlover added 2 commits April 3, 2015 17:27

simplify creation of temporary split directories

644e696

+Dejavu.SLICE_LIMIT_WHEN_SPLITTING

b0f0224

slice limit is now a property in the Dejavu class

+Dejavu.LIMIT_CPU_CORES_FOR_SPLITS

eee54eb

worldveil closed this Aug 3, 2015

worldveil reopened this Aug 3, 2015

thesunlover mentioned this pull request Aug 5, 2015

Split fingerprinting #87

Open

thesunlover closed this Aug 5, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split big files #75

Split big files #75

thesunlover commented Mar 23, 2015

worldveil commented Mar 28, 2015

thesunlover commented Mar 28, 2015

worldveil Mar 28, 2015

thesunlover Mar 30, 2015

thesunlover Apr 1, 2015

thesunlover Apr 3, 2015

worldveil commented Mar 28, 2015

thesunlover commented Mar 30, 2015

thesunlover commented Mar 30, 2015

thesunlover commented Apr 2, 2015

worldveil commented Apr 2, 2015

thesunlover commented Apr 3, 2015

thesunlover commented Apr 3, 2015

worldveil commented Apr 13, 2015

worldveil commented Apr 19, 2015

thesunlover commented Apr 20, 2015

worldveil commented Apr 20, 2015

worldveil commented Aug 3, 2015

thesunlover commented Aug 3, 2015

thesunlover commented Aug 5, 2015

thesunlover commented Mar 28, 2017

NathanielCustom commented Mar 31, 2019

Split big files #75

Split big files #75

Conversation

thesunlover commented Mar 23, 2015

worldveil commented Mar 28, 2015

thesunlover commented Mar 28, 2015

worldveil Mar 28, 2015

Choose a reason for hiding this comment

thesunlover Mar 30, 2015

Choose a reason for hiding this comment

thesunlover Apr 1, 2015

Choose a reason for hiding this comment

thesunlover Apr 3, 2015

Choose a reason for hiding this comment

worldveil commented Mar 28, 2015

thesunlover commented Mar 30, 2015

thesunlover commented Mar 30, 2015

thesunlover commented Apr 2, 2015

worldveil commented Apr 2, 2015

thesunlover commented Apr 3, 2015

thesunlover commented Apr 3, 2015

worldveil commented Apr 13, 2015

worldveil commented Apr 19, 2015

thesunlover commented Apr 20, 2015

worldveil commented Apr 20, 2015

worldveil commented Aug 3, 2015

thesunlover commented Aug 3, 2015

thesunlover commented Aug 5, 2015

thesunlover commented Mar 28, 2017

NathanielCustom commented Mar 31, 2019