New Release ArangoDB 2.6
Estimated reading time: 13 minutes
We are proud to announce the latest release of ArangoDB with lot’s of improvements and many new features. ArangoDB 2.6 is available for download for many different operating systems. In the new release the focus is on performance improvements. For instance sorting on a string attribute is up to 3 times faster. There are also improvements in the shortest-path implementation and other graph related AQL queries.
Look at some of our previous blogposts like: Reusable Foxx Apps with Configurations, Document your Foxx Apps with Swagger 2 or the Improved System User Authentication to learn more about ArangoDB 2.6 and check the manual for a deeper dive into specific features.
Claudius, CEO: “The performance improvements in every area of ArangoDB 2.6 make ArangoDB an effective alternative to other databases. I am very proud of the product and the team, and we can expect much more in the next few months.”
Some of the performance improvements:
FILTERconditions: simpleFILTERconditions we’ve tested are 3 to 5 times faster- simple joins using the primary index (
_keyattribute), hash index or skiplist index are 2 to 3.5 times faster - extracting the
_keyor other top-level attributes from documents is 4 to 5 times faster COLLECTstatements: simpleCOLLECTstatements we’ve tested are 7 to 15 times faster
Max, Software Architect: “With ArangoDB 2.6 we accelerate some of our key areas significantly. Everybody benefits from these improvements, especially people who like to use more complex and ambitious queries. Our latest benchmark already shows this because we have used a preview of ArangoDB 2.6.”
Please give ArangoDB 2.6 a try and provide us with your valuable feedback.
Features and Improvements
- front-end: display of query execution time
- front-end: demo page added. only working if demo data is available.
- front-end: renamed query submit to execute
- front-end: added query explain feature
- removed startup option
--log.severity - added optional
limitparameter for AQL functionFULLTEXT - make fulltext index also index text values that are contained in direct sub-objects of the indexed attribute.
Previous versions of ArangoDB only indexed the attribute value if it was a string. Sub-attributes of the index attribute were ignored when fulltext indexing.
Now, if the index attribute value is an object, the object’s values will each be included in the fulltext index if they are strings. If the index attribute value is an array, the array’s values will each be included in the fulltext index if they are strings.
For example, with a fulltext index present on the translations attribute, the following text values will now be indexed:
var c = db._create("example"); c.ensureFulltextIndex("translations"); c.insert({ translations: { en: "fox", de: "Fuchs", fr: "renard", ru: "лиса" } }); c.insert({ translations: "Fox is the English translation of the German word Fuchs" }); c.insert({ translations: [ "ArangoDB", "document", "database", "Foxx" ] });
c.fulltext("translations", "лиса").toArray(); // returns only first document c.fulltext("translations", "Fox").toArray(); // returns first and second documents c.fulltext("translations", "prefix:Fox").toArray(); // returns all three documents
- added batch document removal and lookup commands:
collection.lookupByKeys(keys) collection.removeByKeys(keys)
These commands can be used to perform multi-document lookup and removal operations efficiently from the ArangoShell. The argument to these operations is an array of document keys.
Also added HTTP APIs for batch document commands:
- PUT /_api/simple/lookup-by-keys
- PUT /_api/simple/remove-by-keys
- properly prefix document address URLs with the current database name for calls to API method GET
/_api/document?collection=...(that method will return partial URLs to all documents in the collection).Previous versions of ArangoDB returned the URLs starting with/_api/but without the current database name, e.g./_api/document/mycollection/mykey. Starting with 2.6, the response URLs will include the database name as well, e.g./_db/_system/_api/document/mycollection/mykey. - subquery optimizations for AQL queriesThis optimization avoids copying intermediate results into subqueries that are not required by the subquery.
- return value optimization for AQL queriesThis optimization avoids copying the final query result inside the query’s main
ReturnNode. - allow
@and.characters in document keys, tooThis change also lead to document keys being URL-encoded when returned in HTTPlocationresponse headers. - added alternative implementation for AQL COLLECTThe alternative method uses a hash table for grouping and does not require its input elements to be sorted. It will be taken into account by the optimizer for
COLLECTstatements that do not use anINTOclause.In case a
COLLECTstatement can use the hash table variant, the optimizer will create an extra plan for it at the beginning of the planning phase. In this plan, no extraSORTnode will be added in front of theCOLLECTbecause the hash table variant ofCOLLECTdoes not require sorted input. Instead, aSORTnode will be added after it to sort its output. ThisSORTnode may be optimized away again in later stages. If the sort order of the result is irrelevant to the user, adding an extraSORT nullafter a hashCOLLECToperation will allow the optimizer to remove the sorts altogether.In addition to the hash table variant of
COLLECT, the optimizer will modify the original plan to use the regularCOLLECTimplementation. As this implementation requires sorted input, the optimizer will insert aSORTnode in front of theCOLLECT. ThisSORTnode may be optimized away in later stages.The created plans will then be shipped through the regular optimization pipeline. In the end, the optimizer will pick the plan with the lowest estimated total cost as usual. The hash table variant does not require an up-front sort of the input, and will thus be preferred over the regular
COLLECTif the optimizer estimates many input elements for theCOLLECTnode and cannot use an index to sort them.The optimizer can be explicitly told to use the regular sorted variant of
COLLECTby suffixing aCOLLECTstatement withOPTIONS { "method" : "sorted" }. This will override the optimizer guesswork and only produce the sorted variant ofCOLLECT. - re-factored cursor HTTP REST API for cursorsThe HTTP REST API for cursors (
/_api/cursor) has been refactored to improve its performance and use less memory.A post showing some of the performance improvements can be found here: http://jsteemann.github.io/blog/2015/04/01/improvements-for-the-cursor-api/
- simplified return value syntax for data-modification AQL queriesArangoDB 2.4 since version allows to return results from data-modification AQL queries. The syntax for this was quite limited and verbose:
FOR i IN 1..10 INSERT { value: i } IN test LET inserted = NEW RETURN inserted
The
LET inserted = NEW RETURN insertedwas required literally to return the inserted documents. No calculations could be made using the inserted documents.This is now more flexible. After a data-modification clause (e.g.
INSERT,UPDATE,REPLACE,REMOVE,UPSERT) there can follow any number ofLETcalculations. These calculations can refer to the pseudo-valuesOLDandNEWthat are created by the data-modification statements.This allows returning projections of inserted or updated documents, e.g.:
FOR i IN 1..10 INSERT { value: i } IN test RETURN { _key: NEW._key, value: i }
Still not every construct is allowed after a data-modification clause. For example, no functions can be called that may access documents.
More information can be found here: http://jsteemann.github.io/blog/2015/03/27/improvements-for-data-modification-queries/
- added AQL
UPSERTstatementThis adds anUPSERTstatement to AQL that is a combination of bothINSERTandUPDATE/REPLACE. TheUPSERTwill search for a matching document using a user-provided example. If no document matches the example, the insert part of theUPSERTstatement will be executed. If there is a match, the update / replace part will be carried out:UPSERT { page: ‘index.html’ } /* search example / INSERT { page: ‘index.html’, pageViews: 1 } / insert part / UPDATE { pageViews: OLD.pageViews + 1 } / update part */ IN pageViews
UPSERTcan be used with anUPDATEorREPLACEclause. TheUPDATEclause will perform a partial update of the found document, whereas theREPLACEclause will replace the found document entirely. TheUPDATEorREPLACEparts can refer to the pseudo-valueOLD, which contains all attributes of the found document.UPSERTstatements can optionally return values. In the following query, the return attributefoundwill return the found document before theUPDATEwas applied. If no document was found,foundwill contain a value ofnull. Theupdatedresult attribute will contain the inserted / updated document:UPSERT { page: ‘index.html’ } /* search example / INSERT { page: ‘index.html’, pageViews: 1 } / insert part / UPDATE { pageViews: OLD.pageViews + 1 } / update part */ IN pageViews RETURN { found: OLD, updated: NEW }
A more detailed description of
UPSERTcan be found here: http://jsteemann.github.io/blog/2015/03/27/preview-of-the-upsert-command/ - adjusted default configuration value for
--server.backlog-sizefrom 10 to 64. - issue #1231: bug xor feature in AQL: LENGTH(null) == 4This changes the behavior of the AQL
LENGTHfunction as follows: - if the single argument to
LENGTH()isnull, then the result will now be0. In previous versions of ArangoDB, the result ofLENGTH(null)was4. - if the single argument to
LENGTH()istrue, then the result will now be1. In previous versions of ArangoDB, the result ofLENGTH(true)was4. - if the single argument to
LENGTH()isfalse, then the result will now be0. In previous versions of ArangoDB, the result ofLENGTH(false)was5.The results ofLENGTH()with string, numeric, array object argument values do not change.- issue #1298: Bulk import if data already exists (#1298)
This change extends the HTTP REST API for bulk imports as follows:
When documents are imported and the
_keyattribute is specified for them, the import can be used for inserting and updating/replacing documents. Previously, the import could be used for inserting new documents only, and re-inserting a document with an existing would have failed with a unique key constraint violated error.The above behavior is still the default. However, the API now allows controlling the behavior in case of a unique key constraint error via the optional URL parameter
onDuplicate.This parameter can have one of the following values:
error: when a unique key constraint error occurs, do not import or update the document but report an error. This is the default.update: when a unique key constraint error occurs, try to (partially) update the existing document with the data specified in the import. This may still fail if the document would violate secondary unique indexes. Only the attributes present in the import data will be updated and other attributes already present will be preserved. The number of updated documents will be reported in theupdatedattribute of the HTTP API result.replace: when a unique key constraint error occurs, try to fully replace the existing document with the data specified in the import. This may still fail if the document would violate secondary unique indexes. The number of replaced documents will be reported in theupdatedattribute of the HTTP API result.ignore: when a unique key constraint error occurs, ignore this error. There will be no insert, update or replace for the particular document. Ignored documents will be reported separately in theignoredattribute of the HTTP API result.The result of the HTTP import API will now contain the attributesignoredandupdated, which contain the number of ignored and updated documents respectively. These attributes will contain a value of zero unless theonDuplicateURL parameter is set to eitherupdateorreplace(in this case theupdatedattribute may contain non-zero values) orignore(in this case theignoredattribute may contain a non-zero value).To support the feature, arangoimp also has a new command line option
--on-duplicatewhich can have one of the valueserror,update,replace,ignore. The default value iserror.A few examples for using arangoimp with the
--on-duplicateoption can be found here: http://jsteemann.github.io/blog/2015/04/14/updating-documents-with-arangoimp/- changed behavior of
db._query()in the ArangoShell:if the command’s result is printed in the shell, the first 10 results will be printed. Previously only a basic description of the underlying query result cursor was printed. Additionally, if the cursor result contains more than 10 results, the cursor is assigned to a global variablemore, which can be used to iterate over the cursor result.Example:
arangosh [_system]> db._query(“FOR i IN 1..15 RETURN i”) [object ArangoQueryCursor, count: 15, hasMore: true]
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]
type ‘more’ to show more documents
arangosh [_system]> more [object ArangoQueryCursor, count: 15, hasMore: false]
[ 11, 12, 13, 14, 15 ]
- Breaking Changes:
- AQL command:
GRAPH_SHORTEST_PATHnow only returns IDs and does not extract data any more. It yields an additional optionincludeDatawhich is an object, taking exactly two keys: edgesset to true will extract all data stored alongside the edges of each path.verticesset to true will extract all data stored alongside the vertices of each path. The default value of these parameters has been set tofalse- JS Module: general-graph: All graph measurements returning exactly one value returned an array containing exactly this one value. Now they will return the value directly. Modified functions are:
graph._absoluteEccentricitygraph._eccentricitygraph._absoluteClosenessgraph._closenessgraph._absoluteBetweennessgraph._betweennessgraph._radiusgraph._diameter- First started Arango Databases will create the ‘_graph’ collection without waitForSync, so default behaviour of Create & Delete operations on whole graphs change:
- POST /_api/graph would by default return HTTP 201 Created, will now return 202 Accepted
- DELETE /_api/graph/
graph-namewould by default return with HTTP 200, will now return 202 Accepted unless waitForSync is specified as parameter, or the ‘_graph’ collections waitForSync attribute was set. - Improved GRAPH_SHORTEST_PATH computationThis involved a change in the default behaviour. The default setting will now only print the distance and the ids of nodes. We have added an optional boolean parameter
includeDataif this is set totrueall documents and edges in the result will be fully expanded. We have also added an optional parameterincludePathof type object. It has two optional subattributesverticesandedgesboth of type boolean. Both can be set individually and the result will include all vertices on the path ifincludePath.vertices == trueand all edge ifincludePath.edges == truerespectivly. So if you want to get the exactly old result back you have to setGRAPH_SHORTEST_PATH(<graph>, <source>, <target>, {includeData: true, includePath: {edges: true, vertices: true}})The default behaviour is now independent of the size of documents as the extraction part could be optimized. Also the internal algorithm to find all pathes from one source to several targets has been massively improved.
- added support for HTTP push aka chunked encoding
- issue #1051: add info whether server is running in service or user mode?This will add a “mode” attribute to the result of the result of HTTP GET
/_api/version?details=true“mode” can have the following values:
standalone: server was started manually (e.g. on command-line)
service: service is running as Windows service, in daemon mode or under the supervisor- increased default value of
--server.request-timeoutfrom 300 to 1200 seconds for client tools (arangosh, arangoimp, arangodump, arangorestore) - increased default value of
--server.connect-timeoutfrom 3 to 5 seconds for client tools (arangosh, arangoimp, arangodump, arangorestore)
- increased default value of
- added startup option
--server.foxx-queues-poll-intervalThis startup option controls the frequency with which the Foxx queues manager is checking the queue (or queues) for jobs to be executed.The default value is
1second. Lowering this value will result in the queue manager waking up and checking the queues more frequently, which may increase CPU usage of the server. When not using Foxx queues, this value can be raised to save some CPU time. - added startup option
--server.foxx-queues-system-onlyThis startup option controls whether the Foxx queue manager will check queue and job entries in the_systemdatabase only. Restricting the Foxx queue manager to the_systemdatabase will lead to the queue manager having to check only the queues collection of a single database, whereas making it check the queues of all databases might result in more work to be done and more CPU time to be used by the queue manager.The default value is
true, so that the queue manager will only check the queues in the_systemdatabase. - make Foxx queues really database-specific.Foxx queues were and are stored in a database-specific collection
_queues. However, a global cache variable for the queues led to the queue names being treated database-independently, which was wrong. Since 2.6, Foxx queues names are truly database-specific, so the same queue name can be used in two different databases for two different queues. Until then, it is advisable to think of queues as already being database-specific, and using the database name as a queue name prefix to be avoid name conflicts, e.g.:var queueName = “myQueue”; var Foxx = require(“org/arangodb/foxx”); Foxx.queues.create(db._name() + “:” + queueName);
- fixed issue #1247: debian init script problems
- multi-threaded index creation on collection loadWhen a collection contains more than one secondary index, they can be built in memory in parallel when the collection is loaded. How many threads are used for parallel index creation is determined by the new configuration parameter
--database.index-threads. If this is set to 0, indexes are built by the opening thread only and sequentially. This is equivalent to the behavior in 2.5 and before. - speed up building up primary index when loading collections
- added
countattribute toparameters.jsonfile. This attribute indicates the number of live documents in the collection on unload. It is read when the collection is (re)loaded to determine the initial size for the collection’s primary index - removed remainders of MRuby integration, removed arangoirb
- simplified
controllersproperty in Foxx manifests. You can now specify a filename directly if you only want to use a single file mounted at the base URL of your Foxx app. - simplified
exportsproperty in Foxx manifests. You can now specify a filename directly if you only want to export variables from a single file in your Foxx app. - added support for Node.js-style exports in Foxx exports. Your Foxx exports file can now export arbitrary values using the
module.exportsproperty instead of adding properties to theexportsobject. - added
scriptsproperty to Foxx manifests. You should now specify thesetupandteardownfiles as properties of thescriptsobject in your manifests. - updated
joipackage to 6.0.8. - added
extendiblepackage. - added Foxx model lifecycle events to repositories. See #1257.

Leave a Comment