cancel
Showing results for 
Search instead for 
Did you mean: 
cancel
Showing results for 
Search instead for 
Did you mean: 

We are happy to announce the new Windchill Customization board! Learn more.

Windchill Cluster Environment over Monolithic

psingh-8
2-Guest

Windchill Cluster Environment over Monolithic

Hello everyone,

We are currently using a Monolithic Windchill 10.2 M030 and facing a lot of performance issues. We are thinking of moving towards the cluster envt. Need your help with a few pointers:

  • What performance improvements can we expect on moving from monolithic to cluster envt?
  • What cluster config will be most suitable for best performance? I was thinking of 2+2. 1 master+1 active failover and 2 slaves
  • What is the recommended load balancer that we should go with?
  • Should we cluster Cognos as well?

Thanks

Pardeep

1 ACCEPTED SOLUTION

Accepted Solutions

Hello Pardeep,

We tried using one node for data migration and it works well. If you are using queue based loading or if you have indexing/publishing on load, this would put additional load in the server and user might face delay in search and to see additional published file types

As for GC, Did you set startup parameter ParallelGCThread , ConcMarkSweep setup? This should bring down the GC time. If you see a lot of system gc, you can even disable explicit gc.

Memory leak in metafile is always a pain, i have seen administrators writing some scripts to get past this. I always vote for Linux/unix when it comes to this.

If you have not found the root cause of the performance issues, then I would recommend to identify where the bottleneck is or  identify where the mis-configuration is. Because you might face the same problem in cluster as well. You can use PSM, it is a good tool to get to the bottom of this kind of issues,

I don't think PTC has a document that details the configuration for F5 or Pen, you can look it up in google.  it is nothing but a basic http load balancer. You can have windchill health check URLs fired from loadbalancer to ensure that load balancer does not route request to a non functional node. Some critical points for F5 are documented here - https://support.ptc.com/appserver/cs/view/solution.jsp?n=141028

You can use the sizing guide as a base line for sizing the resources. What you need is two nodes which can handle 200 concurrent users each. if you need to cater  1000 active users, you can choose to distribute those users over 2 servers sized to 500 users each or 4 servers sized to 250 users each. While using sizing guide as reference, remember to add enough buffers handle the usage load (pattern and roles) of your user community. If you  don't expect an increase in users, then having 32 cores/300 gigs for each node is definitely an overkill. ( i wish I can have that kind of resources in our environments )

Thank you,

Binesh Kumar

Barry Wehmiller

View solution in original post

5 REPLIES 5

  • What performance improvements can we expect on moving from monolithic to cluster envt?

It totally depends on your user load. Performance issues might be due to different reasons and the solution could be fixing a wrong configuration or it could be scaling up resources and architecture.  If you think that your monolithic environment is maxed out of resources and background processes, reporting, search etc is eating away all resources affecting user transaction, then probably clustering will be a good option. More over with monolithic , you will be always at the risk of single point of failure and really you don't want the publishing  and other background processes to bog down the server.

  • What cluster config will be most suitable for best performance? I was thinking of 2+2. 1 master+1 active failover and 2 slaves

Again it points to the number of active users you are expecting on the server and you have to size the servers up to meet that. Since you are in 10.2, you can opt of fail over master as well.

  • What is the recommended load balancer that we should go with?

I have used both pen and F5 load balancers. F5 is more commercial LTM and offers far more features.

  • Should we cluster Cognos as wel

If you are dealing with large data sets and if reporting is one of your core functionality then Cognos on a dedicated node is ideal.

Thank you

Binesh Kumar

Barry Wehmiller

Hi Binesh,

Thank you for the reply.

The main aim of moving to cluster is indeed providing 24*7 availability. Another reason is we still have a lot of migration to do, so we can add a node just for migration till its completed. We are also facing a lot of performance issues with or without migration running. We are hoping that moving to cluster will get rid of some of those performance related problems. Things like GC suspension are causing a lot of trouble and we already have 16 gigs of heap space for each MS. Increasing this will increase the GC time even further and we can not increase any more method servers. May be with an even distribution of load, things will improve. Load is around 300-400 concurrent heavy users.

The other performance issue that we are facing is because of Microsoft 2008 server where half of the memory is being used for metafile and for some reason class loading is taking too much time, specially when using WinNTFileSystem class methods like getBooleanAttribute(). We are not sure if this will go with Cluster and may be we need to work with Microsoft on this.

I have used both pen and F5 load balancers. F5 is more commercial LTM and offers far more features.

Can you please point me to some document which talks in some details about configuring a load balancer for Windchill?

If you are dealing with large data sets and if reporting is one of your core functionality then Cognos on a dedicated node is ideal.

Load on Cognos is not very high atm but I guess we have an option to cluster Cognos, which will make sure it is available all the time. So May be we don't need a dedicated node for it yet.

Another question is the hardware config. If the current monolithic set up has 300 gigs of memory and is at the moment almost sufficient(if the Meta file issue is resolved). What memory should be sufficient for 4 nodes(2+2)? We have 32 cores on one machine at the moment, so should we also keep the same for each node or should/can/do we decrease/increase that?

Thanks

Pardeep

Hello Pardeep,

We tried using one node for data migration and it works well. If you are using queue based loading or if you have indexing/publishing on load, this would put additional load in the server and user might face delay in search and to see additional published file types

As for GC, Did you set startup parameter ParallelGCThread , ConcMarkSweep setup? This should bring down the GC time. If you see a lot of system gc, you can even disable explicit gc.

Memory leak in metafile is always a pain, i have seen administrators writing some scripts to get past this. I always vote for Linux/unix when it comes to this.

If you have not found the root cause of the performance issues, then I would recommend to identify where the bottleneck is or  identify where the mis-configuration is. Because you might face the same problem in cluster as well. You can use PSM, it is a good tool to get to the bottom of this kind of issues,

I don't think PTC has a document that details the configuration for F5 or Pen, you can look it up in google.  it is nothing but a basic http load balancer. You can have windchill health check URLs fired from loadbalancer to ensure that load balancer does not route request to a non functional node. Some critical points for F5 are documented here - https://support.ptc.com/appserver/cs/view/solution.jsp?n=141028

You can use the sizing guide as a base line for sizing the resources. What you need is two nodes which can handle 200 concurrent users each. if you need to cater  1000 active users, you can choose to distribute those users over 2 servers sized to 500 users each or 4 servers sized to 250 users each. While using sizing guide as reference, remember to add enough buffers handle the usage load (pattern and roles) of your user community. If you  don't expect an increase in users, then having 32 cores/300 gigs for each node is definitely an overkill. ( i wish I can have that kind of resources in our environments )

Thank you,

Binesh Kumar

Barry Wehmiller

We tried using one node for data migration and it works well. If you are using queue based loading or if you have indexing/publishing on load, this would put additional load in the server and user might face delay in search and to see additional published file types

Hi Binesh,

Thank you for your reply once again.

We are using separate queues and workers for load data, hence not much of a worry but yes, search is a problem when  loading is happening.

As for GC, Did you set startup parameter ParallelGCThread , ConcMarkSweep setup? This should bring down the GC time. If you see a lot of system gc, you can even disable explicit gc.

Yes we have already disabled explicit GC. currently parallel thread count is 6. It has brought down the GC time from 30 to 15-20 seconds compared to when we had threads count at 2  but that's not good enough I think. Need to look more into CMS parameters.

Memory leak in metafile is always a pain, i have seen administrators writing some scripts to get past this. I always vote for Linux/unix when it comes to this.

Nothing like UNIX!

That's indeed a real pain. I believe Microsoft has provided something for it but they have asked us to run a tool called RAMMAP to establish that we have the issue. Fun part is this tool does not respond on the server, may be because memory is too high for it and unless we can establish that it is a memory leak, our IT will not allow us to deploy the solution. Here is the link to Microsoft case:

https://support.microsoft.com/en-us/kb/976618

PSM is an interesting tool and we are already using it. it was because of PSM that we were able to find out few of the performance issues and we are already using it to monitor the Application.

You can use the sizing guide as a base line for sizing the resources

300g/32c is definitely an overkill and we are not certainly not thinking of going ahead with it. I will go through the resizing once again once I have a POC ready. Thanks for the suggestions by the way, I feel like you have made the sizing decision for me almost clear.

Load-Balancer, I have never configured one. So more of an unknown territory for me. Will have to go through a few documents. What I hear is a lot of PTC customers are using F5 Big IP. I will check how it will work out for us. Also need to check with the IT if they have some preference/recommendations about the load balancer.

Can Windows Failover Cluster also be used/useful somewhere if used with the loadbalancer or we don't need it at all?

Thankyou  so much for all the help

Pardeep

Hi Pardeep,

I think it will be good idea for you to try ConcMarkSweep and run a MULG. I have got slightly better results with this. F5 is fairly easy to configure from a Windchill front, make sure that the time out and buffer matches what we have in Apache.

Are you using VMs, if yes most of the hypervisor offers fail over to alternate hardware in the event of a hardware failure. Apart from that I don't see a reason to use Microsoft clusters in 10.2. I remember some of the large implementations in 10.1 or earlier had the master cache configured in Windows cluster to eliminate the SPOF.

Thank you

Binesh Kumar

Barry Wehmiller

Top Tags