Better handling of the fact that nodes have individual disks #1830

scottyeager · 2022-11-11T05:20:37Z

Currently, 3Nodes don't make any information available about the size of the individual disks they contain. Storage appears instead as a single quantity of SSD and, when applicable, HDD that are available for reservations, out of a single total quantity. This can create the appearance that a node can support a reservation of disk up to the available quantity, when in fact they can only support a reservation up to the largest available block on a given disk.

To improve the experience for deployers, we could:

Make individual disk info available for query over RMB. Then interfaces can at least help the user to reserve within the limits of the disk
Provide virtual disks that span multiple physical disks in the node, so reservations up to the total available capacity of the node work seamlessly. A consideration in this approach is that some users may wish to separate their reservation of multiple physical disks in order to use software RAID schemes, for example, so information about the size of individual disks would still be relevant

Likewise, farmers can benefit from greater visibility into the disks in their nodes. This can help when disks are not recognized, incorrectly recognized (SSD as HDD), or have failed. For farmers:

Give information about how Zos sees the disks, including their ordering, such as /dev/sda == 1TB SSD, /dev/sdb == 2TB HDD (existing filesystem detected, disk not used)
Show this information on the node console so its immediately visible when the node boots, and also make it available over RMB

The text was updated successfully, but these errors were encountered:

muhamadazmy · 2022-11-14T09:51:54Z

Yes, i have been raising this issue (internally) for a while. It's indeed very bad that we show the capacity of the total amount of disks instead of individual disks for the exact same reasons as you mentioned.

But i would like to comment on the approach.

Individual disk capacity are reported and shown on the chain. instead of the aggregated capacity
No rmb should be involved into this, since all information will be available on the chain
There is no way (at the moment) we can create a virtual disk that span multiple physical disks. (without probably a huge impact) this was only possible if we used something like LVM or use btrfs over multiple disks. but we decided to not do this to make it easier to replace failed disks.

Doing this query over RMB would be easier of course since this doesn't require model changes on the chain. but the problem is now selecting a node that matches a workload require a lot of queries to multiple nodes which will slow things down a lot.

scottyeager · 2022-11-14T23:40:26Z

This sounds good, and I agree that adding the data to TF Chain is the best overall solution for the deployment side.

For farmers, I still think it would be nice to have more info, such as about disks that were passed over due to existing data, available over rmb or at least printed to the console. It's pretty common for new farmers to have some disks not detected when they boot up their first nodes, and being able to know whether Zos sees the disks at all would be really helpful.

Not providing a built in solution to span disks is okay to me for now too. This can be accomplished after deployment with a single command for btrfs.

I think it could be nice to provide the option to specify which physical disk a virtual disk is reserved on. So users can achieve RAID 1, for example, in software across multiple physical disks without needing to reserve all available space on the first disk Zos is allocating from.

Parkers145 · 2023-01-29T16:55:02Z

Have users bringing up this problem in chat again recently have we made any progress here?

DylanVerstraete · 2023-01-31T10:20:56Z

I'm not sure why this cannot be queried client side, if a user is interested about a specific node he can query that node for the disk setup. Adding this data on chain does not seem like the right decision here, we actually want to minimize data stored on chain.

Parkers145 · 2023-01-31T12:29:22Z

Does that mean every person that uses the grid would have to learn to use graphql?

DylanVerstraete · 2023-01-31T13:10:01Z

If the data is not stored on chain it also will be not be stored in graphql. I meant that our deployment tools can easily fetch this information over RMB from the node itself. There is no reason we should this data on chain.

Parkers145 · 2023-01-31T13:12:12Z

Oh I see, I'm tracking now. I had misunderstood your response.

scottyeager · 2023-01-31T22:02:54Z

@DylanVerstraete, the biggest issue with relying on RMB for this functionality is that users are usually searching for a node that matches their desired deployment specifications. This means that the front ends like the playground will potentially need to make 100s or 1000s of RMB calls to filter through nodes that might match based on the initial info available from TF Chain.

For example, a user wants a VM with a 1TB disk. There are ~1500 nodes that have 1 TB free SRU. How many of these can actually support a 1TB disk reservation? We can only know by checking them one by one.

Grid Proxy could potentially be extended to query and cache this data from the nodes, so it can be used in an efficient way by the front end. I think though that this would be a fairly large extension of its functionality. Maybe @xmonader can weigh in from the perspective of the front ends and the proxy.

Of course we should reduce bloat on TF Chain wherever possible. If we can find an agreeable way to provide a nice user experience without putting this data on chain, that's certainly fine.

LeeSmet · 2023-02-01T12:07:05Z

To my knowledge, the gridproxy already provides caching of data, so this should not be too hard. But even if this is not the case, we can't just dump it in tfchain. The chain is not meant to be used as a database. Imo, it is already doing way too much, and we need to see how to reduce traffic.

All in all, this should be exposed over RMB, and clients will have to either query individual nodes about their relevant disk layout, or some intermediate tool can aggregate this and clients can query that tool instead.

muhamadazmy · 2023-02-21T14:01:34Z

@LeeSmet yeah, i had a similar chat with @brandonpille and we thought providing an rmb function to show capacity per disk is good enough for the farmer bot to do proper planning also to avoid storing this information on the chain

muhamadazmy · 2023-02-23T11:56:18Z

This functionality is now available on devnet (to return disk capacity per disk over rmb) and should be used in the farmerbot for capacity planning. hence i will close the issue

scottyeager added the type_feature New feature or request label Nov 11, 2022

xmonader added this to the 3.5.x milestone Nov 14, 2022

muhamadazmy self-assigned this Nov 14, 2022

muhamadazmy added this to To do in backlog via automation Nov 14, 2022

muhamadazmy modified the milestones: 3.5.x, 3.6.x Nov 14, 2022

muhamadazmy mentioned this issue Nov 15, 2022

Revisit reserved cache and memory size #1781

Closed

xmonader mentioned this issue Nov 17, 2022

Full nodes (reserved) - real node disk capacity displays as not available do deploy (testnet) #1836

Closed

scottyeager mentioned this issue Jan 31, 2023

Add disks field to nodes threefoldtech/tfchain#594

Closed

muhamadazmy mentioned this issue Feb 21, 2023

api: add function to return usage per node disk #1909

Closed

muhamadazmy closed this as completed Feb 23, 2023

backlog automation moved this from To do to Done Feb 23, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better handling of the fact that nodes have individual disks #1830

Better handling of the fact that nodes have individual disks #1830

scottyeager commented Nov 11, 2022

muhamadazmy commented Nov 14, 2022

scottyeager commented Nov 14, 2022 •

edited

Parkers145 commented Jan 29, 2023

DylanVerstraete commented Jan 31, 2023

Parkers145 commented Jan 31, 2023 •

edited

DylanVerstraete commented Jan 31, 2023

Parkers145 commented Jan 31, 2023

scottyeager commented Jan 31, 2023

LeeSmet commented Feb 1, 2023

muhamadazmy commented Feb 21, 2023

muhamadazmy commented Feb 23, 2023

Better handling of the fact that nodes have individual disks #1830

Better handling of the fact that nodes have individual disks #1830

Comments

scottyeager commented Nov 11, 2022

muhamadazmy commented Nov 14, 2022

scottyeager commented Nov 14, 2022 • edited

Parkers145 commented Jan 29, 2023

DylanVerstraete commented Jan 31, 2023

Parkers145 commented Jan 31, 2023 • edited

DylanVerstraete commented Jan 31, 2023

Parkers145 commented Jan 31, 2023

scottyeager commented Jan 31, 2023

LeeSmet commented Feb 1, 2023

muhamadazmy commented Feb 21, 2023

muhamadazmy commented Feb 23, 2023

scottyeager commented Nov 14, 2022 •

edited

Parkers145 commented Jan 31, 2023 •

edited