Discussion:
FRS in error state - help!
(too old to reply)
Dawn NM
2004-10-19 21:25:35 UTC
Permalink
I have a Windows Server 2003 domain with servers as replication targets.

I am receiving the following errors on one of the servers:
1. Event ID: 13506 The FRS failed a consistency check (Qkey!=QUADZERO) in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
following steps are performed:
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).

3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.

Repeated over and over.

I performed the required steps listed in #2a and #2b at 9:24 am today. It
moved all the files and directories into NTFRS_xxx folders. For two out of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a very large
directory.

Please help. Should I perform the steps for #2 and leave the files in the
NTFRS_xxx folders it creates? Is there something else that could be wrong?

Thank you,

Dawn
Glenn L
2004-10-20 04:41:10 UTC
Permalink
I'm not sure what step 2b is. your thread does not indicate.

The 13506 is an assertion error.
These can only be recovered by doing a restore of the replica set.
This is likely the result of a code defect (al-be-it a rare occurance). I
know MS had a code fix for this one that was included in SP3 for W2K
Even if you called them and got them to fix this, you still must go through
the restore.

Here is the process you must follow.
Stop the FRS service on this server.
Move all the data from NTFRS_Preexisting back into its original location.
(since you indicated some data is in the pre-existing folder)
Go to HLM\system\ccs\services\ntfrs\parameters\backup/restore\process at
startup.
Edit the "burflags" value and give it a HEX value of D2.
exit the registry.
Start the FRS service.

When it starts, it WILL move all data into the NTFRS_preexisting folder.
It will wipe the original FRS database (much the same as step 4c in your
thread)
It will then join the replica set and proceed to synchronize all the data.
Any changed files on an upstream neighbor will replicate to this server.
Any duplicate files will be moved from the pre-existing folder (rather than
copied across the network) to the original directory.
This process can take a long time depending on the number of files,
processing power, and to a lesser extent available bandwidth. You obviously
should perform this operation during a window when the data on this replica
will not be accessed. Say the weekend.

If there is anything left over in the pre-existing, you must analyze those
files to determine if they are new or not.
If they are new, you may simply copy them into the replica.
If they are not, then you must compare to find out which copy is the good
(current copy). If the good copy is in the pre-existing, then copy it over
the bad one in the replica.

When all is done and you know you have all the good data in the replica, you
may safely delete anything left in the pre-existing.

I worked with a customer who had a replica set of ~200,000 files totalling
~75 gigs.
This process took 5 days to complete.
At some point a line is crossed where it would be quicker to move all the
data completely out of the replica, and perform the D2 thereby forcing all
the data in the replica set to copy across the network.
In retrospect it may have been quicker to do it this way rather than the
bandwidth optimizing way above. 75 gigs at 1.5Mbs calculated out would take
4.5 days.

I hope I have not confused you.
--
Glenn L
CCNA, MCSE 2000, MCSE 2003 + Security
Post by Dawn NM
I have a Windows Server 2003 domain with servers as replication targets.
1. Event ID: 13506 The FRS failed a consistency check (Qkey!=QUADZERO) in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).
3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.
Repeated over and over.
I performed the required steps listed in #2a and #2b at 9:24 am today. It
moved all the files and directories into NTFRS_xxx folders. For two out of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a very large
directory.
Please help. Should I perform the steps for #2 and leave the files in the
NTFRS_xxx folders it creates? Is there something else that could be wrong?
Thank you,
Dawn
Jill Zoeller [MSFT]
2004-10-20 23:14:50 UTC
Permalink
Dawn, we've been trying to reproduce this problem. Can you tell me whether
your W2K3 servers are running the latest FRS hotfix and if you can reproduce
this problem?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
I have a Windows Server 2003 domain with servers as replication targets.
1. Event ID: 13506 The FRS failed a consistency check (Qkey!=QUADZERO) in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).
3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.
Repeated over and over.
I performed the required steps listed in #2a and #2b at 9:24 am today. It
moved all the files and directories into NTFRS_xxx folders. For two out of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a very large
directory.
Please help. Should I perform the steps for #2 and leave the files in the
NTFRS_xxx folders it creates? Is there something else that could be wrong?
Thank you,
Dawn
Dawn NM
2004-10-21 14:49:03 UTC
Permalink
Hi Jill,

I believe we are running the latest FRS hotfix. If you tell me what to look
for I can tell you for sure.

I don't want to reproduce the problem, as this is on our production machine
with over 200,000 files and 185 GB of data.

However, this problem did occur twice. The error reappeared after I began
moving files from a particular directory (200,000 files, 42.2GB) from the
NTFRS_prexisting folder back into the original folder.

I fixed it yesterday by following the steps listed again and disabling
replication on the large directory. I would like to reenable replication on
the directory, but I am concerned that that is what caused the problems.

Feel free to email me directly at ***@hotmail.com and we can go further
into specifics.

Dawn
Post by Jill Zoeller [MSFT]
Dawn, we've been trying to reproduce this problem. Can you tell me whether
your W2K3 servers are running the latest FRS hotfix and if you can reproduce
this problem?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
I have a Windows Server 2003 domain with servers as replication targets.
1. Event ID: 13506 The FRS failed a consistency check (Qkey!=QUADZERO) in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).
3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.
Repeated over and over.
I performed the required steps listed in #2a and #2b at 9:24 am today. It
moved all the files and directories into NTFRS_xxx folders. For two out of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a very large
directory.
Please help. Should I perform the steps for #2 and leave the files in the
NTFRS_xxx folders it creates? Is there something else that could be wrong?
Thank you,
Dawn
Jill Zoeller [MSFT]
2004-10-21 15:45:18 UTC
Permalink
Dawn, the W2K3 hotfix is described in
http://support.microsoft.com/?id=823230. You can compare your version of
Ntfrs.exe with the version described in this article.

Are you able to work with our Product Support Services on this issue? They
are in the best position to walk you through recovery, and at the same time
they can work with you to gather data that can help us pinpoint the problem.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
Hi Jill,
I believe we are running the latest FRS hotfix. If you tell me what to look
for I can tell you for sure.
I don't want to reproduce the problem, as this is on our production machine
with over 200,000 files and 185 GB of data.
However, this problem did occur twice. The error reappeared after I began
moving files from a particular directory (200,000 files, 42.2GB) from the
NTFRS_prexisting folder back into the original folder.
I fixed it yesterday by following the steps listed again and disabling
replication on the large directory. I would like to reenable replication on
the directory, but I am concerned that that is what caused the problems.
into specifics.
Dawn
Post by Jill Zoeller [MSFT]
Dawn, we've been trying to reproduce this problem. Can you tell me whether
your W2K3 servers are running the latest FRS hotfix and if you can reproduce
this problem?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
I have a Windows Server 2003 domain with servers as replication targets.
1. Event ID: 13506 The FRS failed a consistency check (Qkey!=QUADZERO) in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).
3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.
Repeated over and over.
I performed the required steps listed in #2a and #2b at 9:24 am today.
It
moved all the files and directories into NTFRS_xxx folders. For two
out
of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a very large
directory.
Please help. Should I perform the steps for #2 and leave the files in the
NTFRS_xxx folders it creates? Is there something else that could be wrong?
Thank you,
Dawn
Dawn NM
2004-10-21 16:37:05 UTC
Permalink
Hi Jill,

NTFRS versions used:

ntfrs.exe 5.2.3790.121
ntfrsapi.dll 5.2.3790.121
ntfrsprf.dll 5.2.3790.121
ntfrsutl.exe 5.2.3790.0

If Product Support Services contact me, I will be happy to answer any
questions and assist them with this problem.

Dawn
Post by Jill Zoeller [MSFT]
Dawn, the W2K3 hotfix is described in
http://support.microsoft.com/?id=823230. You can compare your version of
Ntfrs.exe with the version described in this article.
Are you able to work with our Product Support Services on this issue? They
are in the best position to walk you through recovery, and at the same time
they can work with you to gather data that can help us pinpoint the problem.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
Hi Jill,
I believe we are running the latest FRS hotfix. If you tell me what to look
for I can tell you for sure.
I don't want to reproduce the problem, as this is on our production machine
with over 200,000 files and 185 GB of data.
However, this problem did occur twice. The error reappeared after I began
moving files from a particular directory (200,000 files, 42.2GB) from the
NTFRS_prexisting folder back into the original folder.
I fixed it yesterday by following the steps listed again and disabling
replication on the large directory. I would like to reenable replication on
the directory, but I am concerned that that is what caused the problems.
into specifics.
Dawn
Post by Jill Zoeller [MSFT]
Dawn, we've been trying to reproduce this problem. Can you tell me whether
your W2K3 servers are running the latest FRS hotfix and if you can reproduce
this problem?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
I have a Windows Server 2003 domain with servers as replication targets.
1. Event ID: 13506 The FRS failed a consistency check (Qkey!=QUADZERO) in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).
3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.
Repeated over and over.
I performed the required steps listed in #2a and #2b at 9:24 am today.
It
moved all the files and directories into NTFRS_xxx folders. For two
out
of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a very large
directory.
Please help. Should I perform the steps for #2 and leave the files in the
NTFRS_xxx folders it creates? Is there something else that could be wrong?
Thank you,
Dawn
Jill Zoeller [MSFT]
2004-10-21 21:35:10 UTC
Permalink
It looks like you are running the latest version of NTFRS, based on the info
below (copied from the KB article).

Date Time Version Size File name
-------------------------------------------------------
23-Jan-2004 01:49 5.2.3790.121 772,096 Ntfrs.exe
23-Jan-2004 01:49 5.2.3790.121 57,856 Ntfrsapi.dll
23-Jan-2004 01:49 5.2.3790.121 21,504 Ntfrsprf.dll
23-Jan-2004 01:49 5.2.3790.123 9,728 Ntfrsutl.exe


I'm sorry I wasn't clearer earlier--I was recommending that you contact PSS
directly (open a support case) although I understand that not all customers
have the money to do this. Is this an option for you?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
Hi Jill,
ntfrs.exe 5.2.3790.121
ntfrsapi.dll 5.2.3790.121
ntfrsprf.dll 5.2.3790.121
ntfrsutl.exe 5.2.3790.0
If Product Support Services contact me, I will be happy to answer any
questions and assist them with this problem.
Dawn
Post by Jill Zoeller [MSFT]
Dawn, the W2K3 hotfix is described in
http://support.microsoft.com/?id=823230. You can compare your version of
Ntfrs.exe with the version described in this article.
Are you able to work with our Product Support Services on this issue? They
are in the best position to walk you through recovery, and at the same time
they can work with you to gather data that can help us pinpoint the problem.
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
Hi Jill,
I believe we are running the latest FRS hotfix. If you tell me what to look
for I can tell you for sure.
I don't want to reproduce the problem, as this is on our production machine
with over 200,000 files and 185 GB of data.
However, this problem did occur twice. The error reappeared after I began
moving files from a particular directory (200,000 files, 42.2GB) from the
NTFRS_prexisting folder back into the original folder.
I fixed it yesterday by following the steps listed again and disabling
replication on the large directory. I would like to reenable
replication
on
the directory, but I am concerned that that is what caused the problems.
into specifics.
Dawn
Post by Jill Zoeller [MSFT]
Dawn, we've been trying to reproduce this problem. Can you tell me whether
your W2K3 servers are running the latest FRS hotfix and if you can reproduce
this problem?
--
This posting is provided "AS IS" with no warranties, and confers no rights.
Post by Dawn NM
I have a Windows Server 2003 domain with servers as replication targets.
1. Event ID: 13506 The FRS failed a consistency check
(Qkey!=QUADZERO)
in
"QHashInsertLock:" at line 696.
2. Event ID: 13555 The FRS is in an error state. Files will not replicate
to or from one or all of the replica sets on this computer until the
a. stop and re-start ntfrs (done)
b.
(4-a) If any of the DFS alternates or other replica sets hosted by this
server do not have any other replication partners then copy the data under
its share or replica tree root to a safe location.
(4-b) net stop ntfrs
(4-c) rd /s /q c:\winnt\ntfrs\jet
(4-d) net start ntfrs
(4-e) Copy the data from step (4-a) above to the original location after
the service has initialized (5 minutes is a safe waiting time).
3. Event ID: 13502 The FRS is stopping
4. Event ID: 13504 The FRS stopped without cleaning up.
Repeated over and over.
I performed the required steps listed in #2a and #2b at 9:24 am today.
It
moved all the files and directories into NTFRS_xxx folders. For two
out
of
five folders (very critical folders) I then moved back into the directories.
At 13:43 it went into an error state again when I was working on a
very
large
directory.
Please help. Should I perform the steps for #2 and leave the files
in
the
NTFRS_xxx folders it creates? Is there something else that could be wrong?
Thank you,
Dawn
Continue reading on narkive:
Loading...