• Dupebasetype

    From Alan Ianson@1:153/757 to All on Saturday, August 13, 2016 16:47:50
    Hello All,

    I'm getting dupes in my dupes are that are caused by duplicate msgid's. The message content is different though, so I wonder if I need to add any keywords to my config so that hpt will use both msged and the crc of the message.

    Looking through the files I have here I see mention of DupeBaseType but no information to go along with it. Can anyone post for me the different DupeBaseType's and how to enable them?

    Ttyl :-),
    Al

    ... Synonym: A word you use when you can't spell the other.
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From Kai Richter@2:240/1351.7 to Alan Ianson on Sunday, August 14, 2016 11:51:22
    Hello Alan,

    Looking through the files I have here I see mention of DupeBaseType
    but no information to go along with it. Can anyone post for me the different DupeBaseType's and how to enable them?

    That's part of the hpt/doc/hpt.texi information within the source. It seems like your installation may be incomplete, from that .texi the html and .info docs should be generated.

    @node DupeBaseType, DupeHistoryDir, AreasMaxDupeAge, Files and Paths @subsection DupeBaseType
    @findex DupeBaseType
    @table @asis
    @item Syntax:
    ...
    @code{dupeBaseType <TextDupes | HashDupes | HashDupesWMsgId | CommonDupeBase>} >...
    @item Example:
    @code{dupeBaseType HashDupesWMsgId}
    @end table

    I run the default "DupeBaseType HashDupesWMsgId"

    My source is far too old, you may check with the actual version.

    Tschuess

    Kai

    --- GoldED+/LNX 1.1.4.7
    * Origin: Linux: Play with the penguin. (2:240/1351.7)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Sunday, August 14, 2016 09:33:10

    13 Aug 16 16:47, you wrote to All:

    I'm getting dupes in my dupes are that are caused by duplicate
    msgid's. The message content is different though, so I wonder if I
    need to add any keywords to my config so that hpt will use both msged
    and the crc of the message.

    if

    1. the MSGID is the same but the content is different
    2. they're all coming from the same system
    3. they're all within a three year period

    then

    the system posting them is using flawed software...

    there's little you can do to correct it... passing them on to other systems will cause those other systems the same problem you are seeing... the best thing to do is to let the originating system know of the problem and ask them to FixTheirShit<tm> ;)

    )\/(ark

    Always Mount a Scratch Monkey

    ... Congress should simply make all crime illegal!
    ---
    * Origin: (1:3634/12.73)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Sunday, August 14, 2016 09:49:50

    14 Aug 16 11:51, Kai Richter wrote to you:

    Looking through the files I have here I see mention of DupeBaseType
    but no information to go along with it. Can anyone post for me the
    different DupeBaseType's and how to enable them?

    That's part of the hpt/doc/hpt.texi information within the source. It seems like your installation may be incomplete, from that .texi the
    html and .info docs should be generated.

    they didn't seem to be generated on my system so i've quickly manually generated the info flavor... here's what it says about DupeBaseType...

    ===== snip =====

    File: hpt.info, Node: DupeBaseType, Next: DupeHistoryDir, Prev: AreasMaxDupeAge, Up: Files and Paths

    3.3.3 DupeBaseType
    ------------------

    Syntax:
    'dupeBaseType <TextDupes | HashDupes | HashDupesWMsgId |
    CommonDupeBase>'
    Example:
    'dupeBaseType HashDupesWMsgId'

    TextDupes
    stores from, to, subj & msgid as text lines.
    HashDupes
    stores src32 of from + to + subj + msgid.
    HashDupesWMsgId
    same as HashDupes, but stores also msgid as text.
    CommonDupeBase
    stores hashes of from + to + subj + areatag + msgid in one file
    (hpt_base.dpa)

    Default is 'HashDupesWMsgId'.

    This statement cannot be repeated.

    ===== snip =====

    so it looks like you get to choose one and only one DupeBaseType for all areas... you cannot mix and match... i'm also using HashDupesWMsgId...

    )\/(ark

    Always Mount a Scratch Monkey

    ... Alimony is the screwing you get for the screwing you got.
    ---
    * Origin: (1:3634/12.73)
  • From Alan Ianson@1:153/757 to Kai Richter on Sunday, August 14, 2016 17:18:00
    Sunday August 14 2016 11:51, you wrote to me:

    Looking through the files I have here I see mention of
    DupeBaseType but no information to go along with it. Can anyone
    post for me the different DupeBaseType's and how to enable them?

    That's part of the hpt/doc/hpt.texi information within the source. It seems like your installation may be incomplete, from that .texi the
    html and .info docs should be generated.

    The last few times I have compiled husky I was getting errors so I disabled the
    DVIDIR statement in my huskymak.cfg. On my next compile I will enable it again
    and see if I have better results.

    @code{dupeBaseType <TextDupes | HashDupes | HashDupesWMsgId | CommonDupeBase>}

    Thanks for that info. I have not used that statement in my config so I think I am also using HashDupesWMsgid. In the past that has always worked well for me.

    Ttyl :-),
    Al

    ... I tried to drown my problems.. they like beer too!
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From Alan Ianson@1:153/757 to mark lewis on Sunday, August 14, 2016 17:24:00
    Sunday August 14 2016 09:33, you wrote to me:

    I'm getting dupes in my dupes are that are caused by duplicate
    msgid's. The message content is different though, so I wonder if
    I need to add any keywords to my config so that hpt will use both
    msged and the crc of the message.

    if

    1. the MSGID is the same but the content is different
    2. they're all coming from the same system
    3. they're all within a three year period

    then

    the system posting them is using flawed software...

    Yes, it is Mystic. James just added MSGID support when posting via QWK but it seems to be using the same MSGID more than once. I think we can solve that if indeed that is what is happening.. I'm try to get as much good info about what is happening and pass it on to him.

    there's little you can do to correct it... passing them on to other systems will cause those other systems the same problem you are
    seeing... the best thing to do is to let the originating system know
    of the problem and ask them to FixTheirShit<tm> ;)

    Yes, we want that problem solved and I'll pass on what I see to James.

    But I am thinking that hpt should not consider those msgs as dupes because the content is different in all those msgs?

    In the case of those mails I am a leaf node.. so no harm done. But...

    Ttyl :-),
    Al

    ... Remember when safe sex meant not getting caught?
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From Alan Ianson@1:153/757 to mark lewis on Sunday, August 14, 2016 17:31:00
    Sunday August 14 2016 09:49, you wrote to me:


    so it looks like you get to choose one and only one DupeBaseType for
    all areas... you cannot mix and match... i'm also using
    HashDupesWMsgId...

    Yes, I believe my setup is using HashDupesWMsgid by default as it has always done.

    Am I right in my assumption that in spite of the duplicate msgid (We'll fix that ASAP) the msgs should still pass dupe detection because of the different content?

    Ttyl :-),
    Al

    ... Overflow on /dev/null, please empty the bit bucket.
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Monday, August 15, 2016 07:40:00

    14 Aug 16 17:24, you wrote to me:

    I'm getting dupes in my dupes are that are caused by duplicate
    msgid's. The message content is different though, so I wonder if I
    need to add any keywords to my config so that hpt will use both
    msged and the crc of the message.

    if

    1. the MSGID is the same but the content is different
    2. they're all coming from the same system
    3. they're all within a three year period

    then

    the system posting them is using flawed software...

    Yes, it is Mystic.

    ouch :(

    James just added MSGID support when posting via QWK but
    it seems to be using the same MSGID more than once.

    why doesn't he use the same MSGID routine already being used by the rest of the
    system?? that's been working just fine for a while, hasn't it? oh but wait... importing QWK posts is like posting a lot of text files as messages... yeah, i can see how it may be that he's generating the same serial number in such a short time... especially if he's relying only on clock time and not also having
    a number that's incremented serially and stored in a data file so that other parts of the system can get the last number used before they increment it for their post...

    I think we can solve that if indeed that is what is happening.. I'm
    try to get as much good info about what is happening and pass it on to him.

    i think i shared some MSGID code with him some time back... it is what i use here in my utilities and allows for something like 8000 messages per node per day... it ws live tested for three years to ensure that it didn't generate dupes within that three year period like the spec calls for...

    there's little you can do to correct it... passing them on to other
    systems will cause those other systems the same problem you are
    seeing... the best thing to do is to let the originating system know
    of the problem and ask them to FixTheirShit<tm> ;)

    Yes, we want that problem solved and I'll pass on what I see to James.

    hehehe...

    But I am thinking that hpt should not consider those msgs as dupes
    because the content is different in all those msgs?

    no... not when the MSGID is used as the dupe detector... better would be to not
    have a MSGID at all instead of having duplicates...

    In the case of those mails I am a leaf node.. so no harm done. But...

    not yet but consider what happens if your feed needs to do a rescan from you to
    rebuild their system... feeds are two way and your feed has areafix access on your system the same way you have areafix access on theirs ;)

    )\/(ark

    Always Mount a Scratch Monkey

    ... We are everywhere! Unfortunately, so are they.
    ---
    * Origin: (1:3634/12.73)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Monday, August 15, 2016 07:48:00

    14 Aug 16 17:31, you wrote to me:

    Am I right in my assumption that in spite of the duplicate msgid
    (We'll fix that ASAP) the msgs should still pass dupe detection
    because of the different content?

    no... the content isn't looked at... according to the info doc i posted to you the other day, only the FROM, TO, SUBJECT and MSGID are used for dupe detection...

    )\/(ark

    Always Mount a Scratch Monkey

    ... It's hard to believe he beat out 1,000,000 other sperm.
    ---
    * Origin: (1:3634/12.73)
  • From Alan Ianson@1:153/757 to mark lewis on Monday, August 15, 2016 12:33:46
    Monday August 15 2016 07:40, you wrote to me:

    James just added MSGID support when posting via QWK but
    it seems to be using the same MSGID more than once.

    why doesn't he use the same MSGID routine already being used by the
    rest of the system?? that's been working just fine for a while, hasn't
    it?

    It has been working well for me since I started using it late in the 1.10 alpha
    versions, but it never included a MSGID when msgs were posted by QWK. It's not something that has been used a lot and I would get multiple copies of msgs when
    someone posted by Mystics QWK system.

    It seems to be getting used a fair bit these days.. ;)

    Ttyl :-),
    Al

    ... OUT TO LUNCH - If not back at five, OUT TO DINNER!
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From Alan Ianson@1:153/757 to mark lewis on Monday, August 15, 2016 12:41:10
    Monday August 15 2016 07:48, you wrote to me:

    no... the content isn't looked at... according to the info doc i
    posted to you the other day, only the FROM, TO, SUBJECT and MSGID are
    used for dupe detection...

    OK, we are going to need to be more careful with MSGID in that case.

    Ttyl :-),
    Al

    ... Abort, Retry, Fail, Ignore, Complain ?
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Tuesday, August 16, 2016 13:14:52

    15 Aug 16 12:33, you wrote to me:

    James just added MSGID support when posting via QWK but it seems to
    be using the same MSGID more than once.

    why doesn't he use the same MSGID routine already being used by the
    rest of the system?? that's been working just fine for a while,
    hasn't it?

    It has been working well for me since I started using it late in the
    1.10 alpha versions,

    yes... i'm very aware of that as i did a "bit" of testing of 1.10 back then ;)

    but it never included a MSGID when msgs were posted by QWK.

    i wasn't aware of that... all messages posted by users, whether manually written online (could even be ascii uploaded directly into the editor) or uploaded via some sort of offline mail stuff like QWK or bluewave, should have MSGID added *but* there's a problem there with offline mail and MSGIDs... that problem is that unless the BBS has the smarts to also add the proper REPLY control line, then neither can be added to offline mail... replies to posts with MSGID lines must have a REPLY line as well...

    It's not something that has been used a lot and I would get multiple copies of msgs when someone posted by Mystics QWK system.

    that's par for the course when using multiple links for the same area (al la fidoweb or even a standard distribution star arrangement)... that's why relying
    solely on MSGID is not the best/proper way to go... it is one way, yes...

    )\/(ark

    Always Mount a Scratch Monkey

    ... E-mail: when it absolutely has to get lost at the speed of light.
    ---
    * Origin: (1:3634/12.73)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Tuesday, August 16, 2016 13:29:14

    15 Aug 16 12:41, you wrote to me:

    no... the content isn't looked at... according to the info doc i
    posted to you the other day, only the FROM, TO, SUBJECT and MSGID are
    used for dupe detection...

    OK, we are going to need to be more careful with MSGID in that case.

    remember... HPT is the one using only those fields... other tossers use a CRC of the header or the header and the body or even the header plus the first 40 or 50 bytes (which will get some control lines which also reenforces the ideal that message contents are not modified in transit like some tossers that sort control lines instead of leaving them alone) and they may specifically include the MSGID if it exists... using the message body is fine, too, if you can spare
    the CPU cycles to strip out line endings and possibly extraneous white space characters... using the entire header is a good thing and is one reason why software should not be stingy with the time stamp like some are that force the seconds field to :00 instead of using the actual seconds... that limits to one post per second if seconds are a primary sensor used to detect duplicates... simplistic duplicate detection methods run afoul of software that can post hundreds of messages per second... especially if those messages do not have MSGIDs that do not repeat over a three year period...

    )\/(ark

    Always Mount a Scratch Monkey

    ... I only started a BBS to save on the phone bill.
    ---
    * Origin: (1:3634/12.73)
  • From Alan Ianson@1:153/757 to mark lewis on Thursday, August 18, 2016 17:03:00
    Tuesday August 16 2016 13:14, you wrote to me:

    but it never included a MSGID when msgs were posted by QWK.

    i wasn't aware of that... all messages posted by users, whether
    manually written online (could even be ascii uploaded directly into
    the editor) or uploaded via some sort of offline mail stuff like QWK
    or bluewave, should have MSGID added *but* there's a problem there
    with offline mail and MSGIDs... that problem is that unless the BBS
    has the smarts to also add the proper REPLY control line, then neither
    can be added to offline mail... replies to posts with MSGID lines must have a REPLY line as well...

    Well.. that is all rather complicated. What if the BBS can't determine the message that is being replied to. Should it not add the MSGID in that case? Then we get back to the msg going around in circles again.

    It's not something that has been used a lot and I would get
    multiple copies of msgs when someone posted by Mystics QWK
    system.

    that's par for the course when using multiple links for the same area
    (al la fidoweb or even a standard distribution star arrangement)...
    that's why relying solely on MSGID is not the best/proper way to go...
    it is one way, yes...

    I am connected with a node who does this. I guess he has his reasons so I don't
    question it. When that link gets a message for an area that doesn't exist on his system from my node the area is autocreated and I am added to his exports. When I see messages going around in a circle like that I disconnect the area with areafix but keep exporting to him. He gets his redundency and I get one message.. :)

    I am upto 18 dupes here now. All caused by the same MSGID being reused. But the
    content and subject lines are all different. Makes me wonder if HPT is considering the MSGID alone.

    Ttyl :-),
    Al

    ... Discoveries are often made by not following instructions.
    --- GoldED+/LNX 1.1.5-b20160322
    * Origin: The Rusty MailBox - Penticton, BC Canada (1:153/757)
  • From Alan Ianson@1:153/757 to mark lewis on Thursday, August 18, 2016 17:42:28
    Thursday August 18 2016 17:03, I wrote to you:

    I am upto 18 dupes here now. All caused by the same MSGID being
    reused. But the content and subject lines are all different. Makes me wonder if HPT is considering the MSGID alone.

    Here is an example of what I am seeing here..


    ==== Begin "dupe.txt" ====
    = FSX_GEN (21:1/110) ==========================================================
    Msg : 5131 of 5194
    From : Tiny 21:1/130.2 18 Aug 16 06:36:00
    To : Mickey
    Subj : Re: cost of electricity =============================================================================== @TID: Mystic BBS 1.12 A31
    @MSGID: 21:1/130.2 0065e47d
    Mickey wrote to Tiny <=-

    Keeping in mind she won by a landslide, so in fact, the people got
    what they wanted. I have a hard time finding someone these days,
    that'll admit voting for them. Strange. :-)

    Laugh that is true! I've never voted liberal and don't think I would
    start now. ;)

    Shawn


    ... I love animals! But they all seem to taste like chicken.
    ___ MultiMail/Win32 v0.49

    -+- Mystic BBS/QWK v1.12 A31 (Raspberry Pi)

    * Origin: I only have pi's for you. (21:1/130.2)
    SEEN+BY: 1/1 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 SEEN+BY: 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 133 SEEN+BY: 134 135 136 137 138 139 141 142 143 144 145 146 147 148 149 150 151 SEEN+BY: 152 153 154 155 156 157 158 159 160 161 162 999
    @PATH: 1/130 100

    = DUPE (1:153/757) ============================================================
    Msg : 18 of 18
    From : Tiny 21:1/130.2 18 Aug 16 06:36:00
    To : Gryphon
    Subj : Re: BBS Websites =============================================================================== AREA:FSX_GEN
    @TID: Mystic BBS 1.12 A31
    @MSGID: 21:1/130.2 0065e47d
    Gryphon wrote to All <=-

    So does anybody have a wicked awesome example of a BBS website that I
    can totally copy off of? I've decided that I can't design a good
    website to save my life.

    Laugh that's why I've had the same web page for a dozen years. ;)

    Shawn
  • From Kai Richter@2:240/1351.7 to Alan Ianson on Friday, August 19, 2016 09:21:08
    Tach auch Alan!

    Am 18 Aug 16, Alan Ianson schrieb an mark lewis:

    The To, Subject and content is different though. Should HPT be doing
    that?

    What does HashDupesWMsgId mean? With? = Hash + msgID?
    Did you try "TextDupes"?

    The main problem would be still same time and ID. Same time and ID does mean dupe! by concept design.

    Your system runs fine and correct. You shouldn't work around that.
    All links of that broken msgID node would have the same problem.

    If all nodes along that message's path would adjust their dupe checks your network will loose a usually good working dupecheck function.

    Tschuess

    Kai

    --- GoldED+/LNX 1.1.4.7
    * Origin: Cheap, Fast, Reliable - pick any two. (2:240/1351.7)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Friday, August 19, 2016 15:38:56

    18 Aug 16 17:03, you wrote to me:

    but it never included a MSGID when msgs were posted by QWK.

    i wasn't aware of that... all messages posted by users, whether
    manually written online (could even be ascii uploaded directly into
    the editor) or uploaded via some sort of offline mail stuff like QWK
    or bluewave, should have MSGID added *but* there's a problem there
    with offline mail and MSGIDs... that problem is that unless the BBS
    has the smarts to also add the proper REPLY control line, then
    neither can be added to offline mail... replies to posts with MSGID
    lines must have a REPLY line as well...

    Well.. that is all rather complicated.

    yes, exactly... welcome to a world where incompatible software and formats are shoehorned into use instead of embracing the existing standards and formats... QWK is a perfect example of this when FTN software and a point configuration is
    the way that FTN offline mail should really be done...

    What if the BBS can't determine the message that is being replied to. Should it not add the MSGID in that case?

    yes, exactly...

    Then we get back to the msg going around in circles again.

    sorry about that... blame the folks who are trying to drive MSGID/REPLY to being forced into use on everyone just like those that are forcing full SEEN-BY
    lines and dupes as the way of life...

    It's not something that has been used a lot and I would get multiple
    copies of msgs when someone posted by Mystics QWK system.

    that's par for the course when using multiple links for the same area
    (al la fidoweb or even a standard distribution star arrangement)...
    that's why relying solely on MSGID is not the best/proper way to
    go... it is one way, yes...

    I am connected with a node who does this. I guess he has his reasons
    so I don't question it. When that link gets a message for an area that doesn't exist on his system from my node the area is autocreated and I
    am added to his exports. When I see messages going around in a circle
    like that I disconnect the area with areafix but keep exporting to
    him. He gets his redundency and I get one message.. :)

    [smh]

    I am upto 18 dupes here now. All caused by the same MSGID being
    reused. But the content and subject lines are all different. Makes me wonder if HPT is considering the MSGID alone.

    according to the documentation, the TO, FROM, SUBJECT and MSGID should be being
    used...

    )\/(ark

    Always Mount a Scratch Monkey

    ... We lie loudest when we lie to ourselves.
    ---
    * Origin: (1:3634/12.73)
  • From mark lewis@1:3634/12.73 to Alan Ianson on Friday, August 19, 2016 15:37:00

    18 Aug 16 17:42, you wrote to me:

    The To, Subject and content is different though. Should HPT be doing that?

    we've already pointed out that /content/ is not considered... only TO, FROM, SUBJECT and MSGID ;)

    )\/(ark

    Always Mount a Scratch Monkey

    ... The PS/2 Model 30-286 was designed on a Monday.
    ---
    * Origin: (1:3634/12.73)