Skip to content

Latest commit

 

History

History
80 lines (44 loc) · 9.66 KB

Performance considerations.md

File metadata and controls

80 lines (44 loc) · 9.66 KB

Performance considerations

New fields

When a field creation on a table is initiated there are two possible outcomes:

Glomming - The system will attempt to find a physical column on the storage table where the field is being created to see if there is a column of the same data type and max length. If one is found a new storage alias entry is created (in sys_storage_alias table) for that field and no ALTER on the physical storage table is executed. Additional information is available in KB0713626.

Alter table - If the system cannot find a physical column to glomm to a new field will be created directly on the physical storage table resulting in an online alter. Depending on how large the physical storage table is an online alter can take time to complete.

When creating a field from the UI and if an ALTER is performed it is possible to see a table/field that has been partially created. There are quota rules at play on every instance that prevent long running transactions from executing. Specifically the UI transaction quota rule which cancels any UI transaction which runs longer than 298 seconds. Users/admins should be mindful of this as any field/table creation from the UI which results in an online alter that runs longer than 298 seconds will be aborted by the transaction quota rule leaving metadata records behind for partially created fields. The transaction quota does not apply when creating tables/fields from an update set. The UI Transaction module can be found under System Definition > Transaction Quota Rules > UI Transactions.

If the table where the field is being created is greater than 500,000 rows it is best to add conditions to ensure that if an online-alter is performed against the table it will complete successfully without aborting. To do this, an admin can simply add two OR conditions to the existing quota rule: URL does not contain sys_dictionary OR URL does not contain sys_db_object.

Table/field creation from the UI will occur mainly in two places sys_dictionary and sys_db_object. This is the reasoning for adding these two conditions. By adding the conditions the UI Transaction quota bypasses UI transactions when a field or table is being created.

INSTANT, NOCOPY and INPLACE alters are available starting from Quebec release. Your database have to be upgraded to the version of MariaDB where this is available (10.3.7). To get database version navigate to https://<instance>.service-now.com/xmlstats.do?include=database and check value of db.product_version tag.

Additional information is avialable in MariaDB documentation.

Another problem wich may occur in case alter table path was chosen by the platform is error "Syntax Error or Access Rule Violation detected by database (Row size too large (> 8126)". Caused by MariaDB row size limit of 8126 bytes. To better understand amount of data which each row can store refer to Database field type article.

There are 2 possible solutions to workaround the issue. First, to reduce number of fields on the affected table (keep in mind table hierarchy here). Especially string fields. Second, to migrate some of the fields into Offrow Storage (cen be done only by support personnel).

Offrow Storage was introduced to help customers get around the 8126 byte MariaDB limit for task table. An Offrow Storage tables are named by using the original storage table's name with a suffix of "_offrow". When an element is migrated into offrow storage, the element's storage column definition (using the element's storage_alias) is copied to the offrow storage table. Then the data is copied from the original table. After updating sys_storage_alias, the corresponding element's original ("on-row") storage column values are set to NULL.

By using one update set, you are able to group table alters. For example, if 10 columns (for one storage table) are created through UI then it has to go through 10 ALTERS, but if they all committed through one update set then all of them will be done through 1 alter. This works only in scope of 1 update set. Combining multiple update sets using batching will NOT allow this optimization to combine alters from multiple update sets.

Text indexing

By default, the system creates and maintaines text indexes for the tables (and their childs) with a text index record (All > System Definition > Text Indexes). Exluded datatypes: Date, Date/Time, True/False, Workflow. Each data modification causes text_index event to update text index. Use no_text_index=true dictionary attribute to disable indexind for a specific field (same attribute can be used to disable indexing for the whole table in a hierarchy).

Bulk update of the indexed field will cause generation of text_index event for each updated row and can trigger "Event Queue Performance" alert on ServiceNow Support side. This is especially relevant for the Reference fields (they are indexed by default).

Async business rules

When async business rule is triggered ServiceNow inserts record into sys_trigger table without blocking original transaction (for example, form submission will not wait for BR script execution). Scheduler Worker picks up and executes trigger after any action is taken on the record which triggered BR.

Trigger will consume a Scheduler Worker as long as it runs. If several users are doing similar transactions (or batch data update/upload occured; for example, bulk closure of tasks) we could end up with all Scheduler Workers on all nodes running jobs created by async BR. This will have instance-wide impact and will delay execution of: workflows with timers (in reality almost all workflows), other async BRs...

It's important to note that record in sys_trigger is inserted only if async BR condition is evaluated to true. Condition evaluation happens synchronously. So in condition you have all objects and methods available at your disposal as if you are working with on after BR (for example, GlideElement.changesTo).

Access Control Lists (ACLs)

To write efficient ACLs conditions it's important to know their evaluation sequence:

  1. Requires role
  2. Condition
  3. Script

From performance perspective order is exactly the same (Сoincidence? I don't think so). Role(s) evaluation is the fastest and script is the slowest.

Why script is the slowest? Well, ServiceNow has to invoke a Rhino JavaScript engine to evaluate the script. The best practice with scripts is to have them shielded by "Requires role" and "Condition". In this way, the script won’t even run unless the ACL first matches the role and then matches the condition, potentially sidestepping a performance overhead before it occurs.

And one last thing. Even for "Condition" try to avoid dot-walking ("Show related records"). Each dot-walk is additonal query to the database.

System properties vs instance cache

ServiceNow is a clustered platform with multiple nodes working as a single instance of ServiceNow. Each node maintains its own cache to avoid excessive database queries to fetch commonly used items from database. System properties are, at no surpside, part of this cache.

So how instance keeps those caches in sync? Well, the aswer is simple. When a property is changed on one node, it tells all the other nodes to dump their properties cache and get the new value for all properties from the database. This has small impact on the system performance. This happens regardless of the ignore_cache field been set to true for the specific property.

If ignore_cache set is set to false (nowadays, instance default is true), then we not only flush the properties cache but we also flush the whole system cache on all nodes. The reason is that we need make sure to flush any dependencies or stale values in other caches that might be related to the old value of the property that was just changed.

In short: ignore_cache = true > properties cache flush; ignore_cache = false > system cache flush.

Cache flushes can be observed in sys_cache_flush table.

Cache can be flushed programmatically using undocumented gs.cacheFlush('syscache_realform');. In this example system will flush only forms layout cache. All possible values can be found on cache_inspect.do page (use Debug Security to understand how to access it). Without parameters equal to cache.do.

Recommendation: do not use properties for values that will be frequently changed (let's say, more than once a month).

Burst Worker and sys_trigger.priority

The 3 types of workers that work that queue are:

  • Scheduler Worker - each node has 8 of these to process jobs within the Scheduler Queue (scheduler workers are also threads and are numbered 0 to 7)
  • Burst Worker - This is a special 9th worker thread that is only used under specific scenarios to ensure the most critical jobs don't get delayed. For a job to run on the Burst Worker it must have a priority of <=25 and have been queued for 60 seconds (meaning there's a delay and it has been sitting there)
  • Progress Worker - These run on scheduler threads and are designed to handle long running jobs where we want to display the progress/percentage in the UI (use cases such as: upgrades, plugins, update sets, etc.). By functioning as a "wrapper" around the job, this allows it to provide updates on the activities it's conducting back to you in the UI to keep you informed of the status.

You can see the current details of the workers related to the node you're on by visiting the "stats.do" page.