-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple devices per datanode, per tier #26
Comments
Hi Tim,
The model in Crail is that datanodes are per device (NIC, SSD, etc.).
Scaling to multiple NICs or SSDs is done by starting multiple datanodes,
one per device. Essentially an IP/port is exposing a storage namespace of a
host, so the handling of multiple devices is done at the global Crail
level. The design was chosen to keep datanodes and metadata server simple.
That being said, there nothing in the Crail storage interface that wouldn't
permit building a new datanode which exposes multiple devices. It should be
very simple to build, functionally, One has to be careful, though, with
performance. Currently the metadata server has know knowledge about the
storage topologies inside a single datanode. If a datanode exports multiple
devices, these registrations will appear at the namenode as resources of a
single host. Consequently the namenode cannot distribute block allocations
over the different devices during file writes. That could be fixed by
registering individual blocks in a round robin manner with the namenode
instead of registering entire regions. But it's not a great fix. If we see
that there is a real need for more complex datanodes handling multiple
devices, we may need to add explicit support for it at the namenode.
Currently I don't see such a need, but feel free to provide details about
your use cases and why starting datanodes per device is not sufficient. I'd
be interested in hearing..
Thanks!
-Patrick
…On Wed, Oct 18, 2017 at 3:29 AM, Tim Bisson ***@***.***> wrote:
I don't believe Crail supports multiple devices per datanode for a
specific tier. For instance, exporting of two nvmef targets from a
storagetier on a single datanode, like:
crail.storage.blkdev.datapath /dev/nvme0n1,/dev/nvme1n1,
I'm trying to figure out scope out how much effort this would be, but I
first wanted to check and see if there is already any plans or existing
work to support such functionality? This would probably be most useful to
the blk-dev repo, where we could just expose multiple iscsi/nvmef targets
to a namenode, like in the above conf example
Thanks,
Tim
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#26>, or mute the thread
<https://github.com/notifications/unsubscribe-auth/ARxkOk8ysMFsnrlXgjvRfWgUs8tky-3Pks5stVRrgaJpZM4P9CU9>
.
|
Thanks for the response Patrick. The use case I was thinking of was performance-based: with multiple NVMe drives per datanode, we wanted to see if their aggregate bandwidth can get close to that of memory. However, there are different ways to accomplish this - such as with containers or volume mangers - without having to modify/change Crail's design. |
Hi Tim,
Crail is all about performance, if a single datanode managing multiple
devices gives better performance than multiple datanodes managing a single
device each then we should consider the option. Have a look at our NVMf blog
http://www.crail.io/blog/2017/08/crail-nvme-fabrics-v1.html
There, the configuration is, each storage server has two SSDs and runs two
Crail datanodes each. We reach the network line speed easily by aggregating
the bandwidth of the two SSDs. In the past we have also done similar
experiments with 4 SSDs and also there we were able to aggregate the
bandwidth of the devices using the "one datanode per device" approach.
Please let us know why you think having a single datanode managing multiple
devices will give better performance.
…-Patrick
On Wed, Oct 18, 2017 at 9:01 PM, Tim Bisson ***@***.***> wrote:
Thanks for the response Patrick.
The use case I was thinking of was performance-based: with multiple NVMe
drives per datanode, we wanted to see if their aggregate bandwidth can get
close to that of memory. However, there are different ways to accomplish
this - such as with containers or volume mangers - without having to
modify/change Crail's design.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARxkOtqF8_wahfjt4zUPu0hxSKmFXGnuks5stksHgaJpZM4P9CU9>
.
|
Hi, My objective is to run multiple SSDs on a physical server. I didn't fully understand your first reply, but now I see that what you are suggesting is a another way to achieve the same thin. I was assuming only one datanode per storage server. I like Crail's design is because you don't have to worry about making a datanode scale to the number of SSDs you give it. Tim |
Hi Tim,
Yes, that was one of the ideas behind the design. But we can always
re-evaluate it at any point in time...
…-Patrick
On Fri, Oct 20, 2017 at 12:24 AM, Tim Bisson ***@***.***> wrote:
Hi,
My objective is to run multiple SSDs on a physical server. I didn't fully
understand your first reply, but now I see that what you are suggesting is
a another way to achieve the same thin. I was assuming only one datanode
per storage server. I like Crail's design is because you don't have to
worry about making a datanode scale to the number of SSDs you give it.
Tim
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#26 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/ARxkOpyZOMiSfOxjFCWXa1zcrMMxnNEEks5st8wVgaJpZM4P9CU9>
.
|
I don't believe Crail supports multiple devices per datanode for a specific tier. For instance, exporting of two nvmef targets from a storagetier on a single datanode, like:
crail.storage.blkdev.datapath /dev/nvme0n1,/dev/nvme1n1,
I'm trying to figure out scope out how much effort this would be, but I first wanted to check and see if there is already any plans or existing work to support such functionality? This would probably be most useful to the blk-dev repo, where we could just expose multiple iscsi/nvmef targets to a namenode, like in the above conf example
Thanks,
Tim
The text was updated successfully, but these errors were encountered: