Auto Route stops processing fwd jobs, at Retries=0

Trouble during PacsOne Server installations
Post Reply
tburba
Posts:50
Joined:Fri Apr 23, 2010 5:02 pm
Contact:
Auto Route stops processing fwd jobs, at Retries=0

Post by tburba » Fri Oct 11, 2013 11:32 am

An installation of PacsOne 6.3.7 (a designated DICOM router) often fails to forward studies to another installation of 6.3.7 (the PACS). In the Job Status page, they are visible among "Failed" and the "Retries" column shows 0, sometimes 1. Local admins must waste their time by periodically examining the page and retrying manually (if doctors do not alert them earlier about missing studies).

Example 1, at 12:29:

Code: Select all

	819703
	dicom
	PACS6
	StorageCommitReq
	StorageCommit
	1.2.826.0.1.3680043.2.737.2013.10.9.11.41.56.619
	Immediately
	N/A
	2013-10-09 11:41:56
	2013-10-09 11:42:03
	2013-10-09 11:43:06
	failed
	0
	Time out waiting for data from remote peer
Failed SOP Instance: 1.2.826.0.1.3680043.2.737.2013.10.9.11.41.56.619
	0
Example 2, some other day, at 13:35:

Code: Select all

	823673
	dicom
	PACS6
	forward
	Study
	Patient Name: NAME SURNAME
	Study ID: STUDY_ID
	Accession Number: ACC_N
	Immediately
	N/A
	2013-10-11 10:47:42
	2013-10-11 10:50:00
	2013-10-11 10:51:01
	failed
	0
	Failed SOP Instances:
1.2.392.200036.9110.11004409001.1.2.20131011074746.4682
1.2.392.200036.9110.11004409001.1.1.20131011074715.5626
	0
Sometimes those failed jobs are older than two hours, and still have Retries=0. Retry Interval is zero everywhere which, according to the manual, means "Immediately".

Most jobs, of course, are ordinary study forwarding attempts. The first one is a stgcmt request -- we once thought that enabling them will somehow help to diagnose; however it was shortly disabled as checking the status of commitments is very inefficient and slows down the web interface.

I never saw Retries=3 which is the default value of the MaximumRetries parameter in registry. Recently I increased MaximumRetries to 100 but still didn't see a value of Retries large enough, which would confirm that the new parameter is read from the registry and acted upon. On a related note, on startup PacsOne.exe should log non-default values of those configurable parameters, so that admins immediately see whether their attempts to reconfigure were correct.

A similar issue: http://pacsone.net/forum/viewtopic.php?p=7000. Indeed network congestion could be the cause, but I still don't understand why there are no automatic retries. It might be also possible that even those Retries=1 are occasional attempts from the admins to retry manually, which fails again due to busy destination at that moment.

pacsone
Site Admin
Posts:3149
Joined:Tue Sep 30, 2003 2:47 am

Post by pacsone » Fri Oct 11, 2013 5:13 pm

The 1st job you posted (Job ID 819703) is NOT an automatic routing job but a Storage Commitment Request, so there won't be any retries for those jobs.

The 2nd job (Job ID 823673) is a forwarding job but the job level is "STUDY" instead of the usual "IMAGE", so not sure whether it's a manual forwarding job or an automatic routing job. Can you post the screenshot of the Auto Route page where the automatic routing rule for this job was defined (Destination AE Title of "PACS6")?

tburba
Posts:50
Joined:Fri Apr 23, 2010 5:02 pm
Contact:

Post by tburba » Fri Oct 11, 2013 7:27 pm

There is a lot per-modality rules. Their only difference is waiting time.

Just a thought: how will the router behave if a CT study contains a couple of SR files?

Image

Image

pacsone
Site Admin
Posts:3149
Joined:Tue Sep 30, 2003 2:47 am

Post by pacsone » Fri Oct 11, 2013 8:28 pm

The 1st auto-route rule in the screenshot (the one with the wildcard '*') would make all the other rules useless, because it matches with ALL modality types with the wildcard selector and everything else, e.g., source/destination AE, schedule, etc, are exactly the same. So the 1st rule is the pretty much the only rule that will be matched, and all the other rules would be ignored by PacsOne Server and therefore you can simply delete them.

Now to trouble-shoot why no retry was attempted for this auto-route rule, can you check if no retries was attempt for ALL failed auto-route jobs, or just some of them?

tburba
Posts:50
Joined:Fri Apr 23, 2010 5:02 pm
Contact:

Post by tburba » Fri Oct 11, 2013 10:09 pm

My bad. I collected only a few cases. Also most cases were handled by admins without alerting me. Next week we all will try to save the "failed" list somewhere before re-forwarding.

However, I'm sure that a bunch of 15-16 failed jobs seen yesterday (from a few minutes to about 2 hours old) were without any retries.
The 1st auto-route rule in the screenshot (the one with the wildcard '*') would make all the other rules useless, because it matches with ALL modality types with the wildcard selector and everything else
That is, rules are processed only in the order listed on the page? Is it possible to move this rule to the end so that it matches only after no other modality matches? The local staff obviously tried to accomplish something like that.

pacsone
Site Admin
Posts:3149
Joined:Tue Sep 30, 2003 2:47 am

Post by pacsone » Mon Oct 14, 2013 4:32 pm

Currently when PacsOne Server checks for matches in the defined automatic routing rules, it will stop after finding the 1st matched rule and will ignore the rest of the defined rules, so if you have multiple matching rules defined, the 1st matching rule from the "select * from autoroute" query results will be selected.

tburba
Posts:50
Joined:Fri Apr 23, 2010 5:02 pm
Contact:

Post by tburba » Fri Oct 18, 2013 2:08 pm

Well, here's the data I promised, for 4 days in total. Redacted a bit for brevity and anonimity. The time when data was collected is also available.

As you see, jobs seen for the first time have Retries=0. PAT006 was retried manually one time. PAT002, PAT005 -- two times.

To me it's obvious that PacsOne never retries automatically. What else can you suggest? Probably debug logging during the next week, though logfiles will surely become enormous -- already 20-30 MB each day. At the current level (Information) they show almost nothing about the forwarding jobs.

Code: Select all

***** 10-14 15:16 *****

	825627
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT001
Study ID: 140
	Immediately
	N/A
	2013-10-14 15:12:01
	2013-10-14 15:14:08
	2013-10-14 15:15:10
	failed
	0
	Failed SOP Instances:
1.3.51.0.7.11357585624.7808.26180.46642.9345.13792.31734
	0

***** 10-15 10:13 *****

	826040
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT002
Study ID: 6975-76
Accession Number: TRAUM.
	Immediately
	N/A
	2013-10-15 09:42:04
	2013-10-15 09:44:17
	2013-10-15 09:45:21
	failed
	0
	Failed SOP Instances:
1.2.392.200036.9110.11004409001.1.2.20131015064218.3287
1.2.392.200036.9110.11004409001.1.1.20131015064147.5311
	0

	826029
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT003
Study ID: 1
	Immediately
	N/A
	2013-10-15 09:40:06
	2013-10-15 09:42:15
	2013-10-15 09:43:19
	failed
	0
	Failed SOP Instances:
1.2.392.200039.105.2.2810.10.20131015.100716.452
(...)
1.2.392.200039.105.2.2810.10.20131015.100601.358
	0

	826106
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT004
Study ID: 1
	Immediately
	N/A
	2013-10-15 09:58:02
	2013-10-15 10:00:13
	2013-10-15 10:01:17
	failed
	0
	Failed SOP Instances:
1.2.392.200039.105.2.204792.10.20131015.94830.202
1.2.392.200039.105.2.204792.10.20131015.94846.155
	0

	826109
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT005
Study ID: 20131015095707
Accession Number: 20131015095707
	Immediately
	N/A
	2013-10-15 09:58:56
	2013-10-15 10:02:08
	2013-10-15 10:03:12
	failed
	0
	Failed SOP Instances:
1.2.826.0.1.3680043.2.4852.20131015.095802578.819001256
	0

***** 10-15 10:20 *****

	826040
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT002
Study ID: 6975-76
Accession Number: TRAUM.
	Immediately
	N/A
	2013-10-15 10:14:03
	2013-10-15 10:14:08
	2013-10-15 10:15:12
	failed
	1
	Failed SOP Instances:
1.2.392.200036.9110.11004409001.1.2.20131015064218.3287
1.2.392.200036.9110.11004409001.1.1.20131015064147.5311
	0

	826109
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT005
Study ID: 20131015095707
Accession Number: 20131015095707
	Immediately
	N/A
	2013-10-15 10:14:03
	2013-10-15 10:14:08
	2013-10-15 10:15:12
	failed
	1
	Failed SOP Instances:
1.2.826.0.1.3680043.2.4852.20131015.095802578.819001256
	0

	826144
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT006
	Immediately
	N/A
	2013-10-15 10:11:56
	2013-10-15 10:14:08
	2013-10-15 10:15:12
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.115.4537636.1381818733.0.550.4
(...)
1.2.840.113619.2.115.4537636.1381818733.0.540.4
	0

***** 10-15 10:21 *****

	826144
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT006
	Immediately
	N/A
	2013-10-15 10:20:13
	2013-10-15 10:20:17
	2013-10-15 10:21:20
	failed
	1
	Failed SOP Instances:
1.2.840.113619.2.115.4537636.1381818733.0.550.4
(...)
1.2.840.113619.2.115.4537636.1381818733.0.540.4
	0

	826040
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT002
Study ID: 6975-76
Accession Number: TRAUM.
	Immediately
	N/A
	2013-10-15 10:20:13
	2013-10-15 10:20:17
	2013-10-15 10:21:20
	failed
	2
	Failed SOP Instances:
1.2.392.200036.9110.11004409001.1.2.20131015064218.3287
1.2.392.200036.9110.11004409001.1.1.20131015064147.5311
	0

	826109
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT005
Study ID: 20131015095707
Accession Number: 20131015095707
	Immediately
	N/A
	2013-10-15 10:20:13
	2013-10-15 10:20:17
	2013-10-15 10:21:20
	failed
	2
	Failed SOP Instances:
1.2.826.0.1.3680043.2.4852.20131015.095802578.819001256
	0

***** 10-15 12:56 *****

	826433
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT007
Study ID: 145
	Immediately
	N/A
	2013-10-15 12:28:07
	2013-10-15 12:30:15
	2013-10-15 12:31:20
	failed
	0
	Failed SOP Instances:
1.3.51.0.7.13505985144.46726.845.33900.19185.27030.26568
	0

***** 10-15 15:00 *****

	826671
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT008
Study ID: 31499-500
Accession Number: KP
	Immediately
	N/A
	2013-10-15 14:21:49
	2013-10-15 14:24:00
	2013-10-15 14:25:01
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.203.4.2147483647.1381835869.532362
1.2.840.113619.2.203.4.2147483647.1381835929.893186
	0

	826642
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT009
Study ID: 7047
Accession Number: K.P.
	Immediately
	N/A
	2013-10-15 14:14:05
	2013-10-15 14:16:17
	2013-10-15 14:17:17
	failed
	0
	Failed SOP Instances:
1.2.392.200036.9110.11004409001.1.1.20131015111420.5156
	0

	826646
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT010
Study ID: 1
Accession Number: 07711
	Immediately
	N/A
	2013-10-15 14:16:53
	2013-10-15 14:24:07
	2013-10-15 14:25:08
	failed
	0
	Failed SOP Instances:
1.3.12.2.1107.5.4.5.20021.30000013101503355134300000071.4.512
(...)
1.3.12.2.1107.5.4.5.20021.30000013101503355134300000065.4.512
	0

	826555
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT011
Study ID: 1
	Immediately
	N/A
	2013-10-15 13:27:48
	2013-10-15 13:30:00
	2013-10-15 13:31:02
	failed
	0
	Failed SOP Instances:
1.3.12.2.1107.5.4.4.1630.30000013101504094131200000453
	0

***** 10-16 12:28 *****

	827529
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT012
Study ID: 103
	Immediately
	N/A
	2013-10-16 11:51:45
	2013-10-16 11:53:59
	2013-10-16 11:55:00
	failed
	0
	Failed SOP Instances:
1.3.51.0.7.2554739700.2693.64328.49149.40377.46787.15839
1.3.51.0.7.11141882003.44288.31817.41481.65362.28155.50274
	0

	827343
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT013
	Immediately
	N/A
	2013-10-16 10:57:58
	2013-10-16 11:00:08
	2013-10-16 11:01:09
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.299.2535.1381329328.0.16.576
(...)
1.2.840.113619.2.299.2535.1381329328.0.14.576
	0

***** 10-16 16:58 *****

	827953
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT014
Study ID: 31648-49
Accession Number: PA
	Immediately
	N/A
	2013-10-16 16:09:50
	2013-10-16 16:12:01
	2013-10-16 16:13:05
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.203.4.2147483647.1381928782.890516
1.2.840.113619.2.203.4.2147483647.1381928737.723911
	0

	827801
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT015
Study ID: 31631-32
Accession Number: KP
	Immediately
	N/A
	2013-10-16 13:47:44
	2013-10-16 13:49:59
	2013-10-16 13:51:03
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.203.4.2147483647.1381920276.163922
1.2.840.113619.2.203.4.2147483647.1381920242.43609
	0

***** 10-16 ??:?? *****

	827034
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT022
Study ID: 253
	Immediately
	N/A
	2013-10-16 09:00:04
	2013-10-16 09:02:17
	2013-10-16 09:03:21
	failed
	0
	Failed SOP Instances:
1.3.51.0.7.12498535070.13167.57931.43145.31698.61065.56453
1.3.51.0.7.12795666144.25717.27210.48646.13888.8802.7120
	0

	827676
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT023
Study ID: Tyrimas
Accession Number: 13101600232996
	Immediately
	N/A
	2013-10-16 12:44:07
	2013-10-16 12:46:15
	2013-10-16 12:47:17
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.221.104435732.1381905755.0.319.4
(...)
1.2.840.113619.2.221.104435732.1381905755.0.328.4
	0

	827671
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT024
Study ID: 1
	Immediately
	N/A
	2013-10-16 12:43:58
	2013-10-16 12:46:08
	2013-10-16 12:47:10
	failed
	0
	Failed SOP Instances:
1.2.392.200036.9116.7.8.6.40595906.6.0.2022978714442958
(...)
1.2.392.200036.9116.7.8.6.40595906.6.0.2022988554471122
	0 

	827725
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT025
Study ID: 1
	Immediately
	N/A
	2013-10-16 13:12:56
	2013-10-16 13:15:10
	2013-10-16 13:16:12
	failed
	0
	Failed SOP Instances:
1.2.392.200036.9116.7.8.6.40595906.6.0.2023006717662471
(...)
1.2.392.200036.9116.7.8.6.40595906.6.0.2023013714581283
	0

	827720
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT026
Study ID: 1
	Immediately
	N/A
	2013-10-16 13:06:58
	2013-10-16 13:14:07
	2013-10-16 13:15:09
	failed
	0
	Failed SOP Instances:
1.3.12.2.1107.5.4.5.140228.30000013101604271260900000182.512
	0

***** 10-17 09:37 *****

	828249
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT016
Study ID: 1
	Immediately
	N/A
	2013-10-17 09:17:54
	2013-10-17 09:20:02
	2013-10-17 09:21:04
	failed
	0
	Failed SOP Instances:
1.3.12.2.1107.5.2.33.37042.3.2013101708534124572802884
(...)
1.3.12.2.1107.5.2.33.37042.3.2013101709212052356103613
	0

***** 10-17 10:10 *****

	828392
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT017
Study ID: 1
	Immediately
	N/A
	2013-10-17 10:06:01
	2013-10-17 10:08:12
	2013-10-17 10:09:14
	failed
	0
	Failed SOP Instances:
1.3.12.2.1107.5.4.4.1630.30000013101504094131200001570
1.3.12.2.1107.5.4.4.1630.30000013101504094131200001574
	0

***** 10-17 11:59 *****

	828604
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT018
Study ID: 0000026880
Accession Number: 6080
	Immediately
	N/A
	2013-10-17 11:23:59
	2013-10-17 11:26:07
	2013-10-17 11:27:11
	failed
	0
	Failed SOP Instances:
1186.1208.11.10.169.10.20131017112328
1186.1208.11.10.169.10.201310171118460
	0

***** 10-17 14:57 *****

	829047
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT019
Study ID: 7190-92
Accession Number: K.P.
	Immediately
	N/A
	2013-10-17 14:39:49
	2013-10-17 14:42:04
	2013-10-17 14:43:06
	failed
	0
	Failed SOP Instances:
1.2.392.200036.9110.11004409001.1.1.20131017113924.5316
(...)
1.2.392.200036.9110.11004409001.1.4.20131017114005.1098
	0

	829006
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT020
Study ID: I24076
	Immediately
	N/A
	2013-10-17 14:33:52
	2013-10-17 14:42:04
	2013-10-17 14:43:06
	failed
	0
	Failed SOP Instances:
1.2.840.113619.2.199.32640.4146.4852.1381994200.11511.119
(...)
1.2.840.113619.2.199.32640.4146.4852.1381994200.11511.147
	0

***** 10-17 16:51 *****

	829153
	dicom
	PACS6
	forward
	Study
	Patient Name: PAT021
Study ID: 1
	Immediately
	N/A
	2013-10-17 16:36:02
	2013-10-17 16:38:12
	2013-10-17 16:39:14
	failed
	0
	Failed SOP Instances:
1.3.12.2.1107.5.4.4.1630.30000013101712061864000000133
1.3.12.2.1107.5.4.4.1630.30000013101712061864000000137
	0

pacsone
Site Admin
Posts:3149
Joined:Tue Sep 30, 2003 2:47 am

Post by pacsone » Sun Oct 20, 2013 3:34 pm

It looks like the current version of PacsOne Server only retries automatic routing jobs at IMAGE level, i.e., the failed routing jobs you listed were all at STUDY level, so PacsOne will not retry those STUDY-level routing jobs as they are less likely to fail with the "Wait N Minutes and Forward the Entire Study" option enabled.

tburba
Posts:50
Joined:Fri Apr 23, 2010 5:02 pm
Contact:

Post by tburba » Sun Oct 20, 2013 9:04 pm

What a pity. Any plans to change that in the near future?

pacsone
Site Admin
Posts:3149
Joined:Tue Sep 30, 2003 2:47 am

Post by pacsone » Mon Oct 21, 2013 4:23 pm

Ok, we'll enable the Retries for all level forwarding jobs including PATIENT, STUDY, SERIES as well as the current IMAGE level.

But keep in mind that the reason why we have the "Wait N Minutes and Forward the Entire Study" option is to avoid the potential failures when PacsOne forwards each image of the studies individually (more network overhead). So if the forwarding job failed at the STUDY level, it seems to suggest network latency issues, so enabling the Retries even at the STUDY level won't help if network congestion is the source of the failures.

tburba
Posts:50
Joined:Fri Apr 23, 2010 5:02 pm
Contact:

Post by tburba » Mon Oct 21, 2013 8:02 pm

Thank you in advance.

The network in question, especially the receiving PACS, might experience congestions, but they are temporary. I myself caught one failed job in about a minute, and the retry succeeded shortly. Studies with no missing images are of utmost importance; if we'll learn that the current infrastructure just can't handle the load, it will be a strong incentive to upgrade.

Post Reply