Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with mmap files larger than 4GB on Windows 64bit operating systems #58

Closed
unbemannt opened this issue Dec 14, 2017 · 6 comments
Closed

Comments

@unbemannt
Copy link
Contributor

unbemannt commented Dec 14, 2017

First, thanks for making this libary available.

I am having issue with some Java test code using larray mmap. I am unable to mmap files larger than 4GB on Windows 64 bit operating systems while the same Java test code runs fine on Linux (30GB no problem). Maybe I am not using the API properly.

I'm getting a core dump with a 5GB file on windows. It almost feels as if the 32 bit larray native binary is loaded as the file limit appears to be around 4GB (4GB is okay, 5GB does not work).

System info:
Windows 10 Pro 64 or Ubuntu Linux 17.10 64bit (same machine, dual boot)
JDK 1.8u151
larray-buffer:0.4.1
larray-mmap:0.4.1

Example code:

package testmmap;

import xerial.larray.mmap.MMapBuffer;
import xerial.larray.mmap.MMapMode;

import java.io.File;
import java.io.IOException;
import java.util.concurrent.ThreadLocalRandom;

/**
 * Creates a 4GB and 5GB file of random data, then reads entire file for min/max values.
 */
public class TestMmap {

    private static void createLargeFile(File file, long bytes) throws IOException {
        long t0 = System.currentTimeMillis();
        System.out.println("\n\nCreating file = " + file);
        MMapBuffer data = new MMapBuffer(file, 0, bytes, MMapMode.READ_WRITE);
        for (long index=0; index<bytes; index+=8) {
            long value = ThreadLocalRandom.current().nextLong(0, Long.MAX_VALUE);
            data.putLong(index, value);
        }
        data.flush();

        System.out.println("Time = " + (System.currentTimeMillis() - t0));
        System.out.println("Size = " + data.size());

        data.close();
    }

    private static void readLargeFile(File file) throws IOException {
        long t0 = System.currentTimeMillis();
        System.out.println("\n\nReading file = " + file);
        MMapBuffer data = new MMapBuffer(file, MMapMode.READ_ONLY);
        long bytes = data.size();
        long min = Long.MAX_VALUE;
        long max = Long.MIN_VALUE;
        for (long index=0; index<bytes; index+=8) {
            long value = data.getLong(index);
            min = (value < min) ? value : min;
            max = (value > max) ? value : max;
        }

        System.out.println("Time = " + (System.currentTimeMillis() - t0));
        System.out.println("Size = " + bytes);
        System.out.println("Min = " + min);
        System.out.println("Max = " + max);

        data.close();
    }

    public static void main(String[] args) throws Exception {
        long byteSizes[] = {4000000000L,5000000000L};
        for (long bytes : byteSizes) {
            File file = new File(String.format("mmap%d.out", bytes));
            if (!file.exists())
                TestMmap.createLargeFile(file, bytes);
            TestMmap.readLargeFile(file);
        }
    }
}
@unbemannt
Copy link
Contributor Author

I also tried building jars locally on Ubuntu with g++-mingw-w64-x86-64 package installed. Cloned from master and created *-0.4.2-SNAPSHOT.jars, copied these over to windows and rebuilt the project. But same results, 4gb works, 5gb not working.

I ran "make win64" then ./sbt compile and ./sbt package to get the jars.

@xerial
Copy link
Owner

xerial commented Dec 15, 2017

Thanks for reporting. Considering that it works in Ubuntu, this looks like a Windows-specific problem, and 4GB is 2^32 boundary, so it seems some larray code might be using invalid address when accessing mmap memory region over 4GB.

Windows specific code is around here:

#if defined(_WIN32) || defined(_WIN64)
void *mapAddress = 0;
jlong maxSize = offset + size;
jint lowLen = (jint) (maxSize);
jint highLen = (jint) (maxSize >> 32);
jint lowOffset = (jint) offset;
jint highOffset = (jint) (offset >> 32);
HANDLE fileHandle = (HANDLE) fd;
HANDLE mapping;
DWORD mapAccess = FILE_MAP_READ;
DWORD fileProtect = PAGE_READONLY;
BOOL result;
if (mode == 0) {
fileProtect = PAGE_READONLY;
mapAccess = FILE_MAP_READ;
} else if (mode == 1) {
fileProtect = PAGE_READWRITE;
mapAccess = FILE_MAP_WRITE;
} else if (mode == 2) {
fileProtect = PAGE_WRITECOPY;
mapAccess = FILE_MAP_COPY;
}
mapping = CreateFileMapping(fileHandle, NULL, fileProtect, highLen, lowLen, NULL);
mapAddress = MapViewOfFile(mapping, mapAccess, highOffset, lowOffset, (DWORD) size);
result = CloseHandle(mapping);
return (jlong) mapAddress;

I'm not using Windows recently, so it will be difficult for me to address this issue soon. And it's great you can build larray by yourself. I guess tweaking the code around https://github.com/xerial/larray/blob/598607d2c7ec56b328bd856c6913f5d26773910f/larray-mmap/src/main/java/xerial/larray/mmap/MMapBuffer.java and the above native code part will be helpful to fix this issue.

Thanks

@unbemannt
Copy link
Contributor Author

Finally got to take another look. I've traced it down to the MapViewOfFile call on line 70 in LArrayNative.c, the size parameter should be cast to (size_t) not (DWORD). Its working as expected now.

@xerial
Copy link
Owner

xerial commented Feb 20, 2018

@unbemannt Good catch! Could you create a PR for this?

@kosiakk
Copy link

kosiakk commented Jun 4, 2018

I hope, it solves the problem, thank you.
I also see, that WinAPI nicely accept zeros for file size: mapping size is fixed anyway. So, passing 0 to both CreateMapping and View is ok, according to the documentation.

I will be glad to test this or any other bugfixes. Luckily I have an access to a fat Windows Server with 1.5 TB RAM :)

But right now the problem still exists for a 5 GB file

@xerial
Copy link
Owner

xerial commented Jun 4, 2018

@kosiakk I haven't released a new version with this fix #59.

I'll try to use docker-based cross compilation to make it easier to build binaries for Windows, Linux, and Mac OS X.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants