Performance Tuning FME

From fmepedia

(Redirected from FME Performance Tuning)


Image:Tip-or-sq-33pc.jpg   The information here is designated one of the  

Top 20 FME Tips

that are the most cool, interesting or useful.





Here's some useful information on performance tuning your translation in FME Workbench.


Table of contents



Interpreting a Log File

Before being able to tune a workspace it's vital to understand how to read an FME log file. Without this knowledge a user will often jump to incorrect conclusions about a translation, and start looking for performance issues in the wrong places.


Useful Log File Indicator

When trying to speed up a translation it is always beneficial to check the log file to determine exactly where the time was being spent.

Tip #1: Click the little stopwatch button to the right of the log pane in Workbench to ensure timings are turned on in your log.



Log File Processing Time

The first thing to note in the log is that the time reported in seconds is FME Processing time - it may not be the same as the overall length of the process. For example here the elapsed time shows the process took over 6 minutes, but the Total field reports FME used only 25 seconds.

Elapsed Time | Total| Incremental CPU (secs)

2006-07-10 14:43:06| 8.5| 0.0|

2006-07-10 14:43:13| 8.8| 0.3|

2006-07-10 14:46:29| 18.0| 9.1|

2006-07-10 14:49:29| 25.8| 7.9|


Tip #2: Check out this FAQ for more information on why logged FME processing time is not the whole story, and why it matters.



Underlying Functionality and How it is Logged

Because FME works by pushing features through the workspace on an individual basis (not a group - see this FAQ for more info) it is not possible to give exact timings for every individual transformer. Therefore a lot of time doesn't get logged separately but is lumped together under the next function that does support timing.

Tip #3: Check out this FAQ for an example of how FME doesn't always show the time take for each transformer, why it matters and what it might look like in a log file.



Temporary Directory Location

One of the first items of importance in the log file is the temporary directory. You'll see this reported as something like this (timings removed for clarity)...

INFORM|FME Configuration: Temporary directory is `C:\DOCUME~1\xxxx\LOCALS~1\Temp'


You'll also see a line commenting on the amount of disk space available in that directory...

INFORM|System Status: 37700MB of disk space available in the FME temporary directory


When FME runs a large, multi-dataset translation it often requires a lot of temporary disk space. This is particularly true when running a Dataset Fanout (this FAQ explains why). So the amount of available disk space is important, but on a performance issue we're more concerned about the speed of all this disk activity.

Tip #4a: Where possible set your temporary directory to point to the fastest disk you have available. This FAQ tells you how to use FME_TEMP to set a different temporary directory

Tip #4b: Where possible don't set your temporary directory on the same disk that the operating system uses; FME might be slowed down by the operating system writing to the same disk at the same time.

Tip #4c: Where possible set the temporary directory to a disk that has a large amount of free space - it won't improve the speed but it may prevent a large translation from failing due to a lack of disk space.

Tip #4d: Why not try that old standby - the RAMDisk (http://en.wikipedia.org/wiki/Ram_disk)?! This sort of technology is staging a minor comeback, and pointing the FME temporary directory to a RAMDisk would have obvious speed benefits. Without recommending one technology over another, this company (http://www.superspeed.com/desktop/ramdisk.php) has software to help set up such a solution.



Where is the Time Going?

The next part of the log file relates to reading data.

Remember we said above that FME works by pushing features through the workspace on an individual basis? Well it starts processing each feature as soon as it is read from the source data. It doesn't read all features then start processing. Therefore it's similarly difficult to look at a log file and try to calculate the time spent reading the data because workspace processing time will be lumped in with it.

See this example. 'Emptying factory pipeline' in this example marks the point at which we've finished reading data.

2006-02-03 11:37:47| 342.7|  0.5|INFORM|Emptying factory pipeline

Here it took 342.7 seconds (about 6 minutes) to read the source data. But, as you now know, this includes time spent processing the features within the workspace. When all the transformers were removed from this workspace we got...

2006-02-03 14:44:43|  66.5|  0.3|INFORM|Emptying factory pipeline

Wow! Only one minute instead of six. This tells us that 80% of the time was spent processing the data and only 20% reading it. In this case the user thought the reading was the bottleneck, but this shows it was the transformation. He should therefore check his workspace to make sure it is as efficient as possible and that there are no unnecessary transformers.

Tip #5: If you're worried about the reading performance of a workspace disconnect the readers from the transformers in Workbench and run the translation again. Then compare the log files. It may be that a lot of the time you assume was spent reading data is actually used by the workspace transformers and this will show where to concentrate your performance efforts.



Database Log File Indicators

Databases are an important component of many datasets and the log file will help us determine both how good our database performance is, plus how well FME is interacting with the database.

The link on tip 2 above provides a good example relating to a prefetch query carried out on an Oracle database…

       2004-05-14 17:18:52| 476.1|  0.0|INFORM|Started SQL cache prefetch
       2004-05-14 17:25:10| 476.2|  0.1|INFORM|Finished SQL cache 

Note the difference in actual time on the left; you can see that the time between when we issue the SQL prefetch and until it’s done is roughly seven minutes. However FME logs only 0.1 seconds of CPU time. From this we can say that the remaining time was spent by Oracle retrieving the data using the query it was given.

To get that time down the user would need to look at how the Oracle database is structured and how the query is written. Perhaps the field searched on isn’t indexed? Maybe the query supplied isn’t as efficient as it could be?

Tip #6a: Check the log carefully to find out how much database-related time is spent outside of FME and see if you need to improve your database efficiency.


Speaking of indexes (indices?) - the matter of whether a table is indexed can have a great effect on the performance when writing data to it.

Writing to an unindexed table is quick because the database has no overhead work to do.

Writing data to a table that is indexed takes a lot longer because - for each row committed - the database has to index the data immediately.

As above, the reported CPU time doesn't change because it is the database server - and not FME - that is doing the indexing work.

Tip #6b: Where possible, drop indexes before doing a bulk load into a table, then recreate them after the load is complete. It is often quicker than leaving the index in place during the data load.


Related to this is the difference between truncating a table and dropping it. FME has settings to do either, but when you truncate a table the index remains and subsequent data loading is slower. When you drop a table first, a side effect is that all indexes are also dropped, hence data can be written faster because no indexing is taking place.

Tip #6c: Consider using the option to Drop a table, rather than Truncate, in order to get better performance from a bulk data load.


In the example illustrating tip 6a, you can see a prefetch for a cache. A cache is used by the Joiner transformer. The Joiner matches records to graphic features. When FME reads a matched database record it will hold it in a cache. For subsequent features this cache is checked for a match before FME checks the database. The advantage there is that database records that are matched by multiple features do not cause FME to do a database read each time because the information is already held in memory (cached). This makes the join quicker and results in less network traffic.

Here is a good example...

@Relate: Database query statistics for table `JOINER:MY_TABLE': 
7 queries made of which 0 were sequential duplicates 
and 1 hit the record cache of 3 records (14% overall cache hit)


Firstly this doesn't explicitly state how many records were matched, but we can make a good guess that it was four. Three features matched records, all with differing IDs, and FME added these records to the cache. The fourth matching feature didn't have to query the database because it hit one of the records held in cache. That's where the 14% comes in, by the way. There were seven queries and one of these matched a cached record (1/7 = 14%). So FME has automatically reduced network traffic on this query by 14%.


The sequential duplicates part, by the way, indicates how many features had identical key IDs. For example...

@Relate: Database query statistics for table `JOINER:MY_TABLE': 
7 queries made of which 3 were sequential duplicates 
and 2 hit the record cache of 2 records (71% overall cache hit)


Here there were two hits on the cache, but also three duplicate features. Duplicate features don't need a database query, provided they are sequential so 2 (cache hits) + 3 (duplicates) = 5 and 5/7 = 71%


So caching affects performance, but what can a user do to help? Well there are two settings that can be applied within the Joiner.

The first setting is cache size. Usually only a subset of records are cached. The cache size setting specifies how many records this subset will be. Once the cache is filled new records can only be added by dropping existing ones. Therefore the larger the number the more records will be held in memory and the less database reads will occur.

Obviously the size of the setting will depend on how many records you have, how often they will be matched by an individual record and how much memory your system contains. At a certain point it will be more efficient to read the database regardless, if the cache is holding so many records your system runs out of memory.

Tip #6d: With Joiner transformers set a cache size that is appropriate for the size of your dataset and the number matches that are likely to be made in that cache.


A second cache related setting is the prefetch. Instead of filling the cache with records as they are matched, the cache can be preloaded (ie filled with a specific set of data before matching takes place) by the user issuing a prefetch query. This prefetch query can select an entire table or a selected part of a table which is most likely to be matched by the feature attributes.

For example, a number of FME features of type 'roads' require a database match. The database table (myrecords) has a field (record_type) with a number of values; roads, highways, avenues, streets. The FME features will only ever be matched to where record_type=roads so the overall join process would be much more efficient if the following prefetch was issued...

select * from myrecords where record_type = 'roads'

Tip #6e: Where Joiner transformers will match only on a known subset of records within a table it will be more efficient to prefetch that subset of records before matching takes place.


NB: It doesn't matter if a required record is not in the prefetch - FME will just go directly to the database to get it. Also, the cache size is only used in conjunction with a prefetch when that prefetch is NOT exhaustive, ie has a where statement. So 'select * from mytable' as a prefetch will cause the cache size to be ignored because the entire set of records is already being held by FME. But 'select * from mytable where type=mytype' will make use of the cache because the prefetch query has not fetched the entire set of records.



Memory Availability

3GB Switch

By default, Windows restricts the memory available to a single process to 2Gbytes. FME is a single-thread process and is so falls prey to this restriction. When it exceeds the available memory either the system will crash or FME will need to start caching features to disk, which has a very negative effect on performance.

2GB is not a large number given the size of current datasets. However, you can increase the amount of memory available to 3Gbytes by setting an operating switch. An article on the Safe web site (http://www.safe.com/support/resources/3gb/index.php) describes how to do this. Obviously, you need to have a computer with at least 3GB of RAM installed before this setting would make any difference.

You can also use a 32bit FME running on a 64 bit workstation which gives access to 4Gbytes RAM.

You might also want to consider using FME 64bit running on a 64bit processor. There are restrictions on some of the supported formats on FME 64bit. Alos, to realize the true benefits of 64bit applications, it recommended that you DOUBLE the amount of RAM you would normally have - i.e. 8Gbytes RAM minimum

Tip #7: Increasing the amount of available memory using the 3/GB switch will make large-scale translations run faster, and permit some translations that would previously fail due to a lack of memory.



Features in Memory

Remember that we said that FME pushes features through the workspace one at a time? Well that's not always the case. While some transformers in Workbench operate on one feature at a time (feature based) others need to work on groups of features (group based). Group-based transformers are the ones that process multiple features simultaneously; for example intersecting many line features to produce a topological network.

Obviously, any transformer that works on a group of features must hold them all in memory at a single time to do so.

So one issue is that if you have multiple group-based transformers strung together in a workspace, particularly when they are in separate streams (parallel connections), then you are potentially storing multiple copies of the data at any one time. Therefore you're using up vital system resources and potentially slowing the translation because it ends up caching data to disk instead.

Tip #8a: Obviously if you need a certain arrangement of transformers then you must use that arrangement, but be aware that multiple group-based transformers can eat up memory very quickly, and try to avoid the situation if at all possible.


For FME2009 the FME_CACHED_OBJECTS_HINT keyword has been deprecated in favour of automated memory management techniques.

Tip #8b: Sit back, relax, and watch as FME handles memory in a way that will maximize performance!



Is Your Attribute Really Necessary?

During the translation - as we've noted several times above - FME will either be holding your data in memory or caching it to a disk. Obviously, the smaller the dataset the less memory used and the better the performance, and this includes the number of attributes.

One particular problem would be carrying around spatial data as attributes. Spatial database formats - for example Oracle or GeoMedia - usually store geometry within a field in the database; for example GEOM. When FME reads the data it converts the GEOM field into FME-style geometry and drops the field from the data.

But, it's usually possible to store geometry inside a number of fields. Sometimes you wish to create a backup copy, and sometimes the original application creates copies for its own purposes. FME will only convert one field into geometry, leaving any others as attributes. Very large and complex attributes, that take up a great deal of system resources.

One user we assisted had just this problem. A compress function in his GIS, instead of simply compressing the original geometry field, created an entirely new field. When FME read the data it used the compressed field for the geometry, but also read the original uncompressed data as a plain attribute. This caused a major slowdown, but by simply applying an AttributeRemover transformer at the start of the translation, the excess geometry column could be removed before it started to get read by group-based transformers, and the translation performance vastly increased.

Another type of attribute to beware of is a List. A list can carry many, many sets of attributes, which is a big drain on resources. For example, use a Joiner to join a feature to 1000 records and you have a list with 1000 sets of records. This is bad enough, but if you explode the list and keep all of the original attributes, then you're getting 1000 features each with 1000 sets of attributes!

Tip #9: Only carry through the translation any geometry and attributes you intend to be available on the output. Remove any excess items as early in the translation process as possible.



Let the database do the work!

Wherever possible let the database do the work. The FME Oracle and most other database readers support both full SQL Statements and SQL WHERE clauses. Use SQL Joins instead of using the FeatureMerger transformer, if possible. Create a database materialised view for even better performance and to simplify your workspace (although sometimes DBAs won't allow you to do this).

For ArcSDE, SQL Statements are only supported for non-spatial tables. For spatial tables use: sdetable -o create_view to create a view that contains a spatial column in the join. The ArcSDE help has useful tips on how to do this, see the ArcGIS Help - Using Database Views: http://webhelp.esri.com/arcgisdesktop/9.2/index.cfm?TopicName=Using_database_views

You can create complex table joins using a combination of sdetable -o create_view followed by SQL ALTER VIEW.

Tip #10: FME can improve performance in some cases by handing off processing to a database.



Writer Order: maximizing a hidden performance improvement

When you have multiple writers in a workspace the data for the first writer just gets written straightaway, whereas subsequent writers get their data cached for later writing. This helps performance in itself, but also makes the first writer in the navigation page - the order of which you can control by right-click > move up - more efficient than any other.

Tip #11: When you have multiple writers in a workspace, always ensure the one getting the larger amount of data is the first writer in the list. Need an example? This FAQ tells you all you need to know.



Mathematical Calculations: speed them up with TCL

Sometimes a complex mathematical formula is most easily calculated by splitting the expression up into smaller parts and calculating these parts within individual ExpressionEvaluator transformers. However, chaining together ExpressionEvaluators in this way is not the most efficient way of processing data.

The reason for this is because FME uses attribute values in a TCL script not by reading them from commands within the script, but by recreating the script for each feature with its relevant values embedded (note that this is my very loose explanation of what is undoubtedly a more complex issue).

The point is that the TCL code gets recompiled for each feature in each ExpressionEvaluator, and chaining a series of these transformers together just compounds the problem.

Tip #12a: When you have multiple ExpressionEvaluator transformers in a workspace, consider condensing them into a single ExpressionEvaluator to cut down on TCL calls and compiling.


Another option is to replace all of the ExpressionEvaluator transformers with a single TCL script. This might sound daunting, but can be relatively simple compared to the previous idea of condensing a tricky algorithm into a single expression.

The TCLCaller transformer is a great way to do this, but remember performance is optimized by manipulating attributes in TCL through the FME_GetAttribute and FME_SetAttribute functions that are provided specifically for this purpose.

Tip #12b: When you have multiple ExpressionEvaluator transformers in a workspace, consider replacing these with a single TCLCaller transformer that contains all of the expressions within a single procedure.



Monitoring Performance

It's often worth using a Performance Monitoring tool (such as PerfMon (http://perfmon.sourceforge.net/)) to log the CPU and memory usage of a process.


Break it Down

Don't bite off more than you can chew. If you have a huge amount of data to process, you may want to consider dividing your processing by some kind of grouping such as region. This way you don't have to do joins across your entire dataset all at once.

For example, you could script a Where clause to select only the data from each of Canada's 10 provinces one at a time, so that only roughly 10 - 20% of the data is processed at any one time by the FME engine. Or you could do successive spatial extent queries. Still, this would ultimately allowing the entire country to be processed. The script to call the workspace would only need to be called 10 times, each time passing the name of the province to be processed to a runtime parameter that was in turn embedded in a SQL or Where clause statement within the workspace.

Attached Files
filesizedate
asdf------
index.php------
white------
User Comments Add a new comment