Discussion:
DFS Replication of 400 GB
(too old to reply)
Joerg
2009-07-19 10:12:17 UTC
Permalink
Hi NG,

we are storing users home directories on a domain controller running w2k3
enterprise (not R2!)
given that we'd like to replicate approx. 400 GB of data to another wk3
enterprise (not R2) server using DFS ntfrs,

-what staging size would i need on both servers?

as i understand, the staging size has a default value of 600 mb roundabout.
does this mean, the service can replicate 600 mb at one push, or does this
mean the service can replicate 600 mb at all. if so, i'd need a staging
size of more than 400gb?? we have some outlook pst files that are about 3gb
in size, so 600mb is a lil bit funky ;)

-what replication topology would be the most appropriate for this scenario?
i can choose from ring, hub/spoke and full mesh.

if i didn't misunderstand dfs, the users will be pointed to one of the
available dfs root targets for this share at random. so is ring topology
usefull at all?? i would think of full mesh, please correct me if im wrong.

-and most important: how long will it take to replicate 400 gb of data
through a 1 gbit/s lan link? i thought of copying the data from server 1 to
server 2 then start the replication. i'm afraid that ntfrs will double all
folders with the _NTFS_<hex> scheme to not overwrite existing data...

thanks a lot
joerg
germany
Anthony [MVP]
2009-07-19 13:29:18 UTC
Permalink
Joerg,
The staging area is just a temporary working space. When it is full,
replication will wait. The idea is that in normal operation the amount of
change data at any one time is less than the staging area. Obviously when
you do the initial replication it will be full.
The thing to bear in mind is whether the initial replication or the amount
of changes can ever complete a replication. For example, the pst will be
replicated as a single file every time it changes. You could get changes
backed up to the point where replication can never complete. Pst's and
databases generally are very poor candidates for replication.
Also, pst's are not supported when stored on the network.

Pre R2, clients will use the Locator service to choose a DFS target; so they
will choose based on AD site provided you have Sites and Services set up. If
they are on the same site, as I assume they are with a 1 Gbps connection,
then they could choose either target so you will need two-way replication.
The default replication will do, you don't need to choose with just two
targets.

How long will it take to do the initial replication? I don't know, but over
1 Gbps it will be fairly quick. You can't just copy it. If there is a
pre-existing folder it will move it all into a Pre-existing folder. There is
not a way to stage the first replica, and anyway staging is only relevant
when you have slow links and a faster way to get the data there. In your
case you would be copying it over the same connection anyway,
Anthony
http://www.airdesk.com
Post by Joerg
Hi NG,
we are storing users home directories on a domain controller running w2k3
enterprise (not R2!)
given that we'd like to replicate approx. 400 GB of data to another wk3
enterprise (not R2) server using DFS ntfrs,
-what staging size would i need on both servers?
as i understand, the staging size has a default value of 600 mb roundabout.
does this mean, the service can replicate 600 mb at one push, or does this
mean the service can replicate 600 mb at all. if so, i'd need a staging
size of more than 400gb?? we have some outlook pst files that are about 3gb
in size, so 600mb is a lil bit funky ;)
-what replication topology would be the most appropriate for this scenario?
i can choose from ring, hub/spoke and full mesh.
if i didn't misunderstand dfs, the users will be pointed to one of the
available dfs root targets for this share at random. so is ring topology
usefull at all?? i would think of full mesh, please correct me if im wrong.
-and most important: how long will it take to replicate 400 gb of data
through a 1 gbit/s lan link? i thought of copying the data from server 1 to
server 2 then start the replication. i'm afraid that ntfrs will double all
folders with the _NTFS_<hex> scheme to not overwrite existing data...
thanks a lot
joerg
germany
DaveMills
2009-07-20 06:14:18 UTC
Permalink
Post by Anthony [MVP]
Joerg,
The staging area is just a temporary working space. When it is full,
replication will wait. The idea is that in normal operation the amount of
change data at any one time is less than the staging area. Obviously when
you do the initial replication it will be full.
The thing to bear in mind is whether the initial replication or the amount
of changes can ever complete a replication. For example, the pst will be
replicated as a single file every time it changes. You could get changes
backed up to the point where replication can never complete. Pst's and
databases generally are very poor candidates for replication.
Also, pst's are not supported when stored on the network.
Pre R2, clients will use the Locator service to choose a DFS target; so they
will choose based on AD site provided you have Sites and Services set up. If
they are on the same site, as I assume they are with a 1 Gbps connection,
then they could choose either target so you will need two-way replication.
The default replication will do, you don't need to choose with just two
targets.
Where can I find docs on the post R2 method.
Post by Anthony [MVP]
How long will it take to do the initial replication? I don't know, but over
1 Gbps it will be fairly quick. You can't just copy it. If there is a
pre-existing folder it will move it all into a Pre-existing folder. There is
not a way to stage the first replica, and anyway staging is only relevant
when you have slow links and a faster way to get the data there. In your
case you would be copying it over the same connection anyway,
Anthony
http://www.airdesk.com
Post by Joerg
Hi NG,
we are storing users home directories on a domain controller running w2k3
enterprise (not R2!)
given that we'd like to replicate approx. 400 GB of data to another wk3
enterprise (not R2) server using DFS ntfrs,
-what staging size would i need on both servers?
as i understand, the staging size has a default value of 600 mb roundabout.
does this mean, the service can replicate 600 mb at one push, or does this
mean the service can replicate 600 mb at all. if so, i'd need a staging
size of more than 400gb?? we have some outlook pst files that are about 3gb
in size, so 600mb is a lil bit funky ;)
-what replication topology would be the most appropriate for this scenario?
i can choose from ring, hub/spoke and full mesh.
if i didn't misunderstand dfs, the users will be pointed to one of the
available dfs root targets for this share at random. so is ring topology
usefull at all?? i would think of full mesh, please correct me if im wrong.
-and most important: how long will it take to replicate 400 gb of data
through a 1 gbit/s lan link? i thought of copying the data from server 1 to
server 2 then start the replication. i'm afraid that ntfrs will double all
folders with the _NTFS_<hex> scheme to not overwrite existing data...
thanks a lot
joerg
germany
--
Dave Mills
There are 10 types of people, those that understand binary and those that don't.
Jörg
2009-07-20 07:09:44 UTC
Permalink
Post by Anthony [MVP]
Joerg,
The staging area is just a temporary working space. When it is full,
replication will wait. The idea is that in normal operation the amount of
change data at any one time is less than the staging area. Obviously when
you do the initial replication it will be full.
The thing to bear in mind is whether the initial replication or the amount
of changes can ever complete a replication. For example, the pst will be
replicated as a single file every time it changes. You could get changes
backed up to the point where replication can never complete. Pst's and
databases generally are very poor candidates for replication.
Also, pst's are not supported when stored on the network.
Pre R2, clients will use the Locator service to choose a DFS target; so they
will choose based on AD site provided you have Sites and Services set up. If
they are on the same site, as I assume they are with a 1 Gbps connection,
then they could choose either target so you will need two-way replication.
The default replication will do, you don't need to choose with just two
targets.
How long will it take to do the initial replication? I don't know, but over
1 Gbps it will be fairly quick. You can't just copy it. If there is a
pre-existing folder it will move it all into a Pre-existing folder. There is
not a way to stage the first replica, and anyway staging is only relevant
when you have slow links and a faster way to get the data there. In your
case you would be copying it over the same connection anyway,
Anthony
http://www.airdesk.com
Thanks for your reply Anthony.
So if i'm right, there is no way to replicate that amount of data over dfs,
because the amount of changes in user data may get too high to get ever
replicated in the available time window.
Also, is there a way to tell DFS not to publish the second root target as
long as the data is not completely replicated? as soon as i add the second
target, users can access it (and get an empty directory when connected to
server 2).

cheers
Joerg
DaveMills
2009-07-20 16:04:23 UTC
Permalink
Post by Jörg
Post by Anthony [MVP]
Joerg,
The staging area is just a temporary working space. When it is full,
replication will wait. The idea is that in normal operation the amount of
change data at any one time is less than the staging area. Obviously when
you do the initial replication it will be full.
The thing to bear in mind is whether the initial replication or the amount
of changes can ever complete a replication. For example, the pst will be
replicated as a single file every time it changes. You could get changes
backed up to the point where replication can never complete. Pst's and
databases generally are very poor candidates for replication.
Also, pst's are not supported when stored on the network.
Pre R2, clients will use the Locator service to choose a DFS target; so they
will choose based on AD site provided you have Sites and Services set up. If
they are on the same site, as I assume they are with a 1 Gbps connection,
then they could choose either target so you will need two-way replication.
The default replication will do, you don't need to choose with just two
targets.
How long will it take to do the initial replication? I don't know, but over
1 Gbps it will be fairly quick. You can't just copy it. If there is a
pre-existing folder it will move it all into a Pre-existing folder. There is
not a way to stage the first replica, and anyway staging is only relevant
when you have slow links and a faster way to get the data there. In your
case you would be copying it over the same connection anyway,
Anthony
http://www.airdesk.com
Thanks for your reply Anthony.
So if i'm right, there is no way to replicate that amount of data over dfs,
because the amount of changes in user data may get too high to get ever
replicated in the available time window.
Also, is there a way to tell DFS not to publish the second root target as
long as the data is not completely replicated? as soon as i add the second
target, users can access it (and get an empty directory when connected to
server 2).
Bear in mind that the DFR referral subsystem is independent of the DFS
Replication system. You can have either without the other.

You can prevent referral by disabling the second link. That is what I do when
setting up a new replica. Disable referrals until I am satisfied the replica is
working correctly. Of course you cannot have an automatic referral disable
feature because what if two people change two different files at the same time,
one on each replica. This would mean that both replicas are out of sync so both
must be disabled and the whole system collapses.
Post by Jörg
cheers
Joerg
--
Dave Mills
There are 10 types of people, those that understand binary and those that don't.
Anthony [MVP]
2009-07-23 20:30:41 UTC
Permalink
Its not so much that there is no way. You have to estimate the frequency and
volume of changes.
One trap with FRS is that if you change permissions on the root it triggers
a replication. If you change them again it is another replication. When you
have 3 or 4 replications of 400GB it will never complete. You have the same
problem on a smaller scale with a large pst.
In DFS/FRS you can disable a replica but that will disable replication. In
DFSR (R2) the two are independent so you can replicate folders without
having them as DFS targets and vice versa.

Anthony,
http://www.airdesk.com
Post by Jörg
Post by Anthony [MVP]
Joerg,
The staging area is just a temporary working space. When it is full,
replication will wait. The idea is that in normal operation the amount of
change data at any one time is less than the staging area. Obviously when
you do the initial replication it will be full.
The thing to bear in mind is whether the initial replication or the amount
of changes can ever complete a replication. For example, the pst will be
replicated as a single file every time it changes. You could get changes
backed up to the point where replication can never complete. Pst's and
databases generally are very poor candidates for replication.
Also, pst's are not supported when stored on the network.
Pre R2, clients will use the Locator service to choose a DFS target; so they
will choose based on AD site provided you have Sites and Services set up. If
they are on the same site, as I assume they are with a 1 Gbps connection,
then they could choose either target so you will need two-way
replication.
The default replication will do, you don't need to choose with just two
targets.
How long will it take to do the initial replication? I don't know, but over
1 Gbps it will be fairly quick. You can't just copy it. If there is a
pre-existing folder it will move it all into a Pre-existing folder. There is
not a way to stage the first replica, and anyway staging is only relevant
when you have slow links and a faster way to get the data there. In your
case you would be copying it over the same connection anyway,
Anthony
http://www.airdesk.com
Thanks for your reply Anthony.
So if i'm right, there is no way to replicate that amount of data over dfs,
because the amount of changes in user data may get too high to get ever
replicated in the available time window.
Also, is there a way to tell DFS not to publish the second root target as
long as the data is not completely replicated? as soon as i add the second
target, users can access it (and get an empty directory when connected to
server 2).
cheers
Joerg
Continue reading on narkive:
Loading...