Development / Re: Broken character encoding
« on: December 27, 2011, 07:04:36 pm »Could you, please, check if the problem persists with the current beta version? (, it does. In beta exactly the same problem.
This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.
Could you, please, check if the problem persists with the current beta version? (, it does. In beta exactly the same problem.
Are you sure that the actual data in memory is like this. I have investigated the problem and it looks correct in memory, but gets scrambled when saving to file (probably because the encoding information is lost).Unfortunately they're broken as well. That's how I noticed. File output was just for debugging purpose.
Try to parse a page and return some data as result to PVD. Would the fields be filled with correct strings?
(12/27/2011 5:07:47 PM) UpdateToolbar: 6
(12/27/2011 5:07:47 PM) UpdateToolbar: 7
(12/27/2011 5:07:56 PM) Compiling script: dmm_0_0_4.psf
(12/27/2011 5:07:56 PM) Script compiled successfully: dmm_0_0_4.psf
[Hint] (381:2): Variable 'P' never used
(12/27/2011 5:07:56 PM) Executing script binary
(12/27/2011 5:07:56 PM) Logging in...
(12/27/2011 5:07:56 PM) Searching movie information for: djk
(12/27/2011 5:07:56 PM) GET: http://www.dmm.co.jp/search/=/searchstr=djk/analyze=V1EBCFcEUAc_/limit=30/sort=rank_asc/view=text/num=1/
(12/27/2011 5:07:58 PM) start!
(12/27/2011 5:07:58 PM) ParseSearchResults! !
(12/27/2011 5:07:58 PM) CodePage:0
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk11/
(12/27/2011 5:07:58 PM) Title: 少女の道・E11
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk18/
(12/27/2011 5:07:58 PM) Title: 少女の道・E18
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk19/
(12/27/2011 5:07:58 PM) Title: 少女の道・E19
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk07/
(12/27/2011 5:07:58 PM) Title: 少女の道・E7
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk14/
(12/27/2011 5:07:58 PM) Title: 少女の道・E14
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk10/
(12/27/2011 5:07:58 PM) Title: 少女の道・E10
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/m_full/-/detail/=/cid=djk09/
(12/27/2011 5:07:58 PM) Title: 少女の道・E9
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=51djk009/
(12/27/2011 5:07:58 PM) Title: 猥E熟・E8 岡・EE/a></p>
<p class="status">
<span class="ico-st-monopoly"><span>独・E/span></span>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GAVhfWkIHWw__/n2=Aw1fVhQKX1ZRAlhMUlo5QQgBU1lR/sort=rank_asc/view=text/num=1/">ビデオ
(12/27/2011 5:07:58 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=51djk009/
(12/27/2011 5:07:58 PM) Title: 猥E熟・E8 岡・EE/a>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GCF5WR14KTg__/n2=Aw1fVhQKX19XC0VQX085WgALX1c_/sort=rank_asc/view=text/num=1/">マニア
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=djks01/
(12/27/2011 5:07:59 PM) Title: 渋谷女子校生 少女の道・E8時間
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=h_275tdjk00016/
(12/27/2011 5:07:59 PM) Title: 宅配露出 2
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=h_275tdjk00004/
(12/27/2011 5:07:59 PM) Title: 宅配露出
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=h_275tdjk00015/
(12/27/2011 5:07:59 PM) Title: ・・E灰鷯・憤E中・E弧穏泪Eぅ芛/a></p>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GAVhfWkIHWw__/n2=Aw1fVhQKX1ZRAlhMUlo5QQgBU1lR/sort=rank_asc/view=text/num=1/">ビデオ
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=h_157djk010/
(12/27/2011 5:07:59 PM) Title: 女子校生見せつけオナニー
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=h_157djk010/
(12/27/2011 5:07:59 PM) Title: 女子校生見せつけオナニー
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=41djk012/
(12/27/2011 5:07:59 PM) Title: 猥らなほどに悩ましい 古都ひか・E/a></p>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GAVhfWkIHWw__/n2=Aw1fVhQKX1ZRAlhMUlo5QQgBU1lR/sort=rank_asc/view=text/num=1/">ビデオ
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/hmp/-/detail/=/cid=41djk012/
(12/27/2011 5:07:59 PM) Title: 猥らなほどに悩ましい 古都ひか・E/a>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GCF5WR14KTg__/n2=Aw1fVhQKX19XC0VQX085XwwV/sort=rank_asc/view=text/num=1/">h.m.p
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=djks03/
(12/27/2011 5:07:59 PM) Title: 渋谷女子校生 少女の道・E8時間 3
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=41hodv00211/
(12/27/2011 5:07:59 PM) Title: ・Eぅ廚気E拭・古都ひか・E/a></p>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GAVhfWkIHWw__/n2=Aw1fVhQKX1ZRAlhMUlo5QQgBU1lR/sort=rank_asc/view=text/num=1/">ビデオ
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/hmp/-/detail/=/cid=41hodv00211/
(12/27/2011 5:07:59 PM) Title: ・Eぅ廚気E拭・古都ひか・E/a>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GCF5WR14KTg__/n2=Aw1fVhQKX19XC0VQX085XwwV/sort=rank_asc/view=text/num=1/">h.m.p
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/digital/videoa/-/detail/=/cid=djks02/
(12/27/2011 5:07:59 PM) Title: 渋谷女子校生 少女の道・E8時間 2
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/mono/dvd/-/detail/=/cid=djks02/
(12/27/2011 5:07:59 PM) Title: 渋谷女子校生 少女の道・E8時間 2
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/mono/dvd/-/detail/=/cid=djks01/
(12/27/2011 5:07:59 PM) Title: 渋谷女子校生 少女の道・E8時間
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/mono/dvd/-/detail/=/cid=h_275tdjk16/
(12/27/2011 5:07:59 PM) Title: 宅配露出 2
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/mono/dvd/-/detail/=/cid=10dnsh001/
(12/27/2011 5:07:59 PM) Title: パンチラ☆ちら見電車 〜チラチラ片道切符〜
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=29djkc04/
(12/27/2011 5:07:59 PM) Title: 痴女×M・E2
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=29djkc01/
(12/27/2011 5:07:59 PM) Title: 痴女×M・E/a>
<td><a href="/search/=/searchstr=djk/limit=30/n1=FgRCTw9VBA4GCF5WR14KTg__/n2=Aw1fVhQKX19XC0VQX085WgALX1c_/sort=rank_asc/view=text/num=1/">マニア
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=29djkb01/
(12/27/2011 5:07:59 PM) Title: 爆乳痴態娘 1 水沢ダイア
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=29djkb02/
(12/27/2011 5:07:59 PM) Title: 爆乳痴態娘 2 真咲菜々
(12/27/2011 5:07:59 PM) URL: http://www.dmm.co.jp/monthly/mania/-/detail/=/cid=29djkj01/
(12/27/2011 5:07:59 PM) Title: 痴的女教師
procedure ParseSearchResults(HTML : String);
curPos, EndPos, P : Integer;
Title, URL : String;
EndPos := 1;
LogMessage('CodePage:' + IntToStr(GetCodePage));
curPos := PosFrom('<table summary="', HTML, EndPos);
EndPos := curPos + Length('<table summary="');
while curPos > 0 do begin
//EndPos := curPos + Length('<table summary="');
curPos := PosFrom('<td><p class="ttl">', HTML, EndPos);
if curPos > 0 then begin
curPos := PosFrom('<a href="', HTML, EndPos);
if curPos > 0 then begin
curPos := curPos + Length('<a href="');
EndPos := PosFrom('">', HTML, curPos);
URL := BASE_URL + Copy(HTML, curPos, EndPos - curPos);
LogMessage('URL: ' + URL); //debug
curPos := EndPos + 2;
EndPos := PosFrom('</a>', HTML, curPos);
Title := Copy(HTML, curPos, EndPos - curPos);
EndPos := PosFrom('<td>', HTML, EndPos) + 2;
EndPos := PosFrom('<td>', HTML, EndPos) + 2;
LogMessage('Title: ' + Title); //debug
AddSearchResult(Title, '', '', URL, '');
curPos := PosFrom('<p class="ttl"', HTML, EndPos);
//This version of script is for use with PVD versions and above!!!
Additional types and functions that can be used in scripts:
TWIDEARRAY : array of String
//Field functions
procedure AddSearchResult(Title1, Title2, Year, URL, PreviewURL : String)
procedure AddFieldValue(AField: Integer; AValue : String)
procedure AddMoviePerson(Name, TransName, Role, URL : String; AType : Byte)
procedure AddPersonMovie(Title, OrigTitle, Role, Year, URL : String; AType : Byte)
procedure AddAward(Event, Award, Category, Recipient, Year: String; const Won : Boolean)
procedure AddConnection(Title, OrigTitle, Category, URL, Year: String)
procedure AddEpisode(Title, OrigTitle, Description, URL, Year, Season, Episode : String)
//String functions
function Pos(Substr : String; Str: String): Integer
function PosFrom(const SubStr, Str : String; FromIndex : Integer) : Integer
function LastPos(const SubStr, Str : String) : Integer
function PrevPos(const SubStr, Str : String; APos : Integer) : Integer
function RemoveTags(AText : String; doLineBreaks : Boolean) : String
function ExplodeString(AText : String; var Items : TWideArray; Delimiters : String) : Integer
function Copy(S: String; Index, Count: Integer): String
procedure Delete(var S: String; Index, Count: Integer)
procedure Insert(Source: String; var Dest: String; Index: Integer)
function Length(S: String): Integer
function Trim(S: String): String
function CompareText(S1, S2: String): Integer
function CompareStr(S1, S2: String): Integer
function UpperCase(S: String): String
function LowerCase(S: String): String
function StringReplace(S, OldPattern, NewPattern: String; ReplaceAll : Boolean; IgnoreCase : Boolean; WholeWord: Boolean): String
function StrToInt(const S: String): Integer
function IntToStr(const Value: Integer): String
function StrToFloat(const S: String): Extended
function FloatToStr(const Value: Extended): String
function HTMLValues(const HTML : String; ABegin, AEnd, ItemBegin, ItemEnd : String; ValDelim : String; var Pos : Integer) : String
function HTMLValues2(const HTML : String; ABegin, AEnd, ItemBegin, ItemEnd : String; ValDelim : String; var Pos : Integer) : String
function TextBetween(const HTML : String; ABegin, AEnd : String; doLineBreaks : Boolean; var Pos : Integer) : String
function HTMLToText(const HTML : String) : String
procedure ShowMessage(const Msg, Head : String)
pauseBeforeLoad = 0; // Pause before loading (in millisecond)
//Some useful constants
//Script types
stMovies = 0;
stPeople = 1;
stPoster = 2;
//Script modes
smSearch = 0;
smNormal = 1;
smPoster = 2;
//Parse results
prError = 0;
prFinished = 1;
prList = 2;
prListImage = 3;
prDownload = 4;
//Movie fields
mfURL = 0;
mfTitle = 1;
mfOrigTitle = 2;
mfAka = 3;
mfYear = 4;
mfGenre = 5;
mfCategory = 6;
mfCountry = 7;
mfStudio = 8;
mfMPAA = 9;
mfRating = 10;
mfTags = 11;
mfTagline = 12;
mfDescription = 13;
mfDuration = 14;
mfFeatures = 15;
//People fields
pfURL = 0;
pfName = 1;
pfTransName = 2;
pfAltNames = 3;
pfBirthday = 4;
pfBirthplace = 5;
pfGenre = 6;
pfBio = 7;
pfDeathDate = 8;
//Credits types
ctActors = 0;
ctDirectors = 1;
ctWriters = 2;
ctComposers = 3;
ctProducers = 4;
//Script data
SCRIPT_NAME = 'DMM.co.jp';
SCRIPT_DESC = '[EN] Get movie information DMM.co.jp';
SCRIPT_LANG = $11; //Japanese //Tested both English & Japanese
SCRIPT_TYPE = stMovies;
BASE_URL = 'http://www.dmm.co.jp';
SEARCH_STR = 'http://www.dmm.co.jp/search/=/searchstr=%s/analyze=V1EBCFcEUAc_/limit=30/sort=rank_asc/view=text/num=1/';
CODE_PAGE = 20932; // Tested: 0, 33722, 51932, 20932, 65001
//Global variables
Mode : Byte;
PosterURL : String;
function GetScriptVersion : String;
function GetScriptName : String;
Result := SCRIPT_NAME;
function GetScriptDesc : String;
Result := SCRIPT_DESC;
function GetRatingName : String;
Result := RATING_NAME;
function GetScriptLang: Cardinal;
Result := SCRIPT_LANG;
function GetCodePage : Cardinal;
Result := CODE_PAGE;
function GetBaseURL : AnsiString;
Result := BASE_URL;
function GetDownloadURL : AnsiString;
if PosterURL = '' then
Result := SEARCH_STR
Result := PosterURL;
function GetScriptType : Byte;
Result := SCRIPT_TYPE;
function GetCurrentMode : Byte;
Result := Mode;
procedure FindPoster(HTML : String);
curPos, EndPos : Integer;
// not yet implemented
// ****************** ParseMovie
procedure ParseMovie(MovieURL : String; HTML : String);
curPos, EndPos, P, P2, L : Integer;
Tmp, URL, Name : String;
AddFieldValue(mfURL, MovieURL);
// not yet implemented
// ****************** ParseSearchResults
procedure ParseSearchResults(HTML : String);
curPos, EndPos, P : Integer;
Title, URL, Tabulka : String;
curPos := PosFrom('<table summary="', HTML, 1); //beginning of results table
EndPos := PosFrom('</table>', HTML, curPos); //end of results table
Tabulka := Copy(HTML, curPos, EndPos - curPos);
StringToFile('wholepage.htm', HTML, false ,false); //debug - output test
StringToFile('tabulka.htm', Tabulka, false ,false); //debug - output test
// LogMessage('CodePage:' + IntToStr(GetCodePage));
// ****************** ParsePage
function ParsePage(HTML : String; URL : AnsiString) : Cardinal;
Wait (pauseBeforeLoad);
//HTML := ConvertEncoding(HTML, 20932);
StringToFile('start.htm', HTML, false ,false); //debug - output test
if Pos('<div class="othertxt">', HTML) > 0 then begin
LogMessage('Nothing Found!'); //debug
Result := prError
end else
if Pos('<th scope="col">', HTML) > 0 then begin
LogMessage('Calling ParseSearchResults!'); //debug
Result := prList;
end else
if Pos('<h1 id="title"', HTML) > 0 then begin
LogMessage('Product page!'); //debug
Mode := smPoster;
if PosterURL <> '' then
Result := prDownload
Result := prFinished;
end else begin
ParseMovie(URL, HTML);
Mode := smNormal;
if PosterURL = '' then
Result := prFinished
Result := prDownload;
Mode := smSearch;
<a href="/digital/videoa/-/detail/=/cid=41djk012/">猥らなほどに悩ましい 古都ひかる</a></p>
becomes<a href="/digital/videoa/-/detail/=/cid=41djk012/">猥らなほどに悩ましい 古都ひか・E/a></p>
Which besides of changing the text, destroys the whole HTML structure.~ -> ?
奥さん! -> ・E気鵝・
女 2 -> ・E2