Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need absolute path to cache language files #64

Open
tnbnicer opened this issue Apr 18, 2023 · 24 comments
Open

Need absolute path to cache language files #64

tnbnicer opened this issue Apr 18, 2023 · 24 comments

Comments

@tnbnicer
Copy link

I believe this to be an issue, or at least something that should be looked into.

phpSyllable-master/src/Cache/File.php on line 53:

file_put_contents($file, $this->encode(self::$data));

file_put_contents requires an absolute path. Instead, what it gets is a relative path.

Warning: file_put_contents(/home/customer/www/mydomain/apps/phpSyllable-master/src/cache/syllable.en-us.json): failed to open stream: No such file or directory in /home/customer/www/mydomain/apps/phpSyllable-master/src/Cache/File.php on line 53.

I was able to workaround it by defining an absolute path. File.php.

private function filename()
{
    $dir = '/home/customer/www/mydomain/apps/phpSyllable-master/src/Cache';
    return $dir.'/'.$this->getFilename(self::$language);
}

Note: I'm not sure why 'syllable.en-us.json' must be written to. For developing purposes? As far as I know, the file never changes, except the first time when it is generated.

Also "file_put_contents" works fine with a relative path on Windows WAMPServer. Only in a shared server virtual hosting environment I get an error/warning. If you cannot reproduce the issue, feel free to query.

@alexander-nitsche
Copy link
Collaborator

alexander-nitsche commented Apr 18, 2023

Would you paste your full code snippet here? And where should the cache directory be?

@tnbnicer
Copy link
Author

I don't know which code snippet you mean. It's more than a snippet. I'm testing it on a live server. The only fix applied is the one referred to above. Actually the whole function, as modified, looks like this, because I began by running it on WAMP. From line 30 in File.php:

private function filename()
{
    $remote_ip = "35.214.1.93";
    $server = (strpos(getenv( "SERVER_ADDR" ), $remote_ip) !== false) ? "remote" : "local";
    $remote = ($server == "remote") ? true : false;
    if ($remote) {
        $dir = '/home/customer/www/mydomain/apps/phpSyllable-masterTest/src/Cache';
    } else {
        $dir = self::$path;
    }
    return $dir.'/'.$this->getFilename(self::$language);
    //return self::$path.'/'.$this->getFilename(self::$language); # this is the original code that returns a relative path.
}

That seems to be the only way to get a valid path, for me anyway.

Apparently all of these files do not receive new data. I uploaded them directly from my WAMP installation on Windows yesterday.

mydomain/apps/phpSyllable-master/src/Cache/syllable.it.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.de-2017.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.en-us-old.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.de-1901.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.de.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.en-gb-old.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.cs.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.pl.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.de-1996.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.grc.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.tr.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.en-gb.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.en-us-berliner.json
mydomain/apps/phpSyllable-master/src/Cache/syllable.en-us.json

Why then is it necessary for them to cache new data?

That is what this function does*:

public function close()
{
    $file = $this->filename();
    file_put_contents($file, $this->encode(self::$data));
    @chmod($file, 0777);
}

*I haven't changed it. "file_put_contents" requires a full path.

@tnbnicer
Copy link
Author

If you'd like me to, provided Github permits it -- probably not -- I could post a link to the test page on my server. With the error.

@tnbnicer
Copy link
Author

tnbnicer commented Apr 18, 2023

Your question about where the cache directory should be, is kind of throwing me off a little. Correct me, if I'm wrong, please. The cache folder is initially defined in the main page. The demo.php calls it like this:

$syllable->getCache()->setPath(dirname(__FILE__).'/cache');

For my purposes, it's defined:

$cache = $syllable->getCache();
$cache->setPath($dir.'/apps/phpSyllable-master/src/cache');

The only other caching is when the user uploads a file, as opposed to pasting text or html in the form. Here I define a different cache that optionally saves the user's file to the server:

if($uploadOk == 1) {
	$server_path = $dir.'/apps/phpSyllable-master/tmp';
	if(is_dir(($server_path))) :
		$ext = explode('.',$_FILES['userfile']['name']);
		$extension = $ext[1];
		$newname = $ext[0].'_'.time();
		$full_server_path = $server_path.'/'.$newname.'.'.$extension;
		move_uploaded_file($_FILES['userfile']['tmp_name'], $full_server_path);
		$text = htmlspecialchars(file_get_contents($full_server_path));
	else:
		$text = htmlspecialchars(file_get_contents($_FILES["userfile"]["tmp_name"]));
	endif;
}
$source = $text;

If $server_path exists, the uploaded file is saved (cached) to that location. I'm not certain if move_uploaded_file also requires a full path. I think it does, similar to file_put_contents.

The point however is that $cache->setPath($dir.'/apps/phpSyllable-masterTest/src/cache') doesn't do the trick for PHP's file_put_contents on line 53 of File.php. That's my interpretation.

If I can assist, in any way, I'm happy to help. I'd tried a Perl hyphenator first. phpSyllable is orders of magnitude better. Brilliant achievement by Mr. Van Der Lee.

@alexander-nitsche
Copy link
Collaborator

alexander-nitsche commented Apr 19, 2023

You need to make sure the phpSyllable cache directory exists. The error of file_put_contents() suggests it does not. Run something like

$cacheDir = $dir.'/apps/phpSyllable-master/src/cache';
if (!is_dir($cacheDir)) {
  mkdir($cacheDir)
}

in the main script somewhere before initialising and using the Syllable object.

Note: You should place the Syllable cache in a folder outside of the phpSyllable app folder and not touch the phpSyllable app folder at all, as this makes updates of the package more difficult.

@tnbnicer
Copy link
Author

You're right. Before the initialization is where I should make sure the folder and files exist. Technically, the folder does exist, but on the server for some reason it isn't found, even causing an error.

On the other hand, there are threads on "Stack Overflow" among other online communities, that point out that you must pass the exact absolute folder path with filename to file_put_contents().

The stumbling block for me is that no method at all writes to the folder, error or no error.

I take on board your suggestion to move the "cache" folder. I've been in touch with my web-hosting service. They inform me that because the cache folder is not in the publicly accessible area, the files are automatically cached. That, of course, will not do, if the aim is to "update" the *.json files stored in cache. So I am very tempted to move the "cache" folder to the publicly accessible site area. Maybe that will fix something. It would be nice to close this issue. Issues are bad publicity and it might be entirely my fault.

Thanks for the reply.

@alexander-nitsche
Copy link
Collaborator

alexander-nitsche commented Apr 19, 2023

Use some project file structure like

<project>/
 ├── apps/phpSyllable-master/
 ├── cache/phpSyllable-master/
 └── main.php

and have some main.php like

<?php
use Vanderlee\Syllable\Syllable;

function hyphenateText($text, $language) {
  $cacheDir = __DIR__.'/cache/phpSyllable-master';
  $languageDir = __DIR__.'/apps/phpSyllable-master/languages';

  if (!is_dir($cacheDir)) {
    mkdir($cacheDir, 0777, true);
  }
  
  $syllable = new Syllable($language);
  $syllable->getCache()->setPath($cacheDir);
  $syllable->getSource()->setPath($languageDir);
  
  return $syllable->hyphenateText($text);
}

echo hyphenateText(
    'WampServer ist eine komplette Webserver-Software inklusive Datenbank. ' .
    'Die Software ist ein Alternative zu XAMPP und enthält den Apache-Webserver, ' .
    'MySQL-Datenbank-Server, PHP sowie PHPmyAdmin.', 
    'de'
);

and you will be fine.

Note: The path in $cacheDir is an absolute path, as required by file_put_contents().

@vanderlee
Copy link
Owner

@alexander-nitsche do you think it would be possible for the language update script to also run a version of the Caching to generate files? That would eliminate the need for write permissions for most use cases unless somebody want to provide their own .tex files.

@tnbnicer
Copy link
Author

Hi. To get involved, if I may. There's money in it. If users could upload their own .tex files, or vocabulary lists to splice onto personalized .tex files, it would allow a hyphenation service, maybe for budding e-book authors. Amazon's hyphenation is a disgrace. Authors can do a lot better, if they try to do it manually themselves.

At the moment my hosting provider is being stubborn. I haven't even gotten as far as writing/caching *.json files. Moving the "Cache" to the public_html area has helped, but there's still some kind of issue with caching and write permissions.

A useful thing I found along the way is that the LibreOffice English hyphenation .tex dictionaries are somewhat better than the existing ones. Should anyone be interested, you can download them from https://github.com/LibreOffice/dictionaries.

As soon as (if) I get the *.json caching fixed on my server, I'll post the result. Note the fix, if there is one, wouldn't be applicable to this issue, which is partially only to do with shared hosting. WAMP works great.

@alexander-nitsche
Copy link
Collaborator

Hi @vanderlee , what do you want to achieve? I am not sure if i could follow here:)

Hi @tnbnicer , what do you want to achieve? Who would want to load his/her own tex files? And you can do so now already. You could simply create a specific language files directory, e.g. __DIR__.'/user/phpSyllable-master/languages', that contains the custom tex files, and point your Syllable object to it via

<?php
..
function hyphenateText($text, $language) {
  ..
  $languageDir = __DIR__.'/user/phpSyllable-master/languages';
  ..
}
..

Btw: there is already such a hyphenation service at https://syllable.toyls.com/ , which is unfortunately broken currently with console error

XHR
POST https://syllable.toyls.com/?path=syllable/hyphenate
[HTTP/2 500 Internal Server Error 40ms]

@tnbnicer
Copy link
Author

@alexander-nitsche
I have yet to come across such a service. People are being duped by Amazon and other e-book publishers into thinking hyphenation is an automatic feature. It is anything but, if you do it right. Think of all the foreign words or coinages an author might have in an e-book. No way can an algorithm hyphenate them all correctly.

In the old days people who did hyphenation were called typesetters.

Another point.
We are talking about a list of perhaps 1000 or more words. Can someone who is serious about publishing a 100000 word e-book upload the list of exceptions to a service, words which he or she manually finds in the text, and upload the hyphenation exceptions list to a service that would parse his/her e-book's html files including the manually hyphenated words in a given language? Sure there are hyphenators, but adding exceptions is a manual chore first. How would like it if you were an author of a book in English with a Latin word in it hyphenated incorrectly?

I speculate of course that Amazon's hyphenation is an algorithm. Most probably it is and uses patterns just like we do.

@tnbnicer
Copy link
Author

tnbnicer commented Apr 20, 2023

@vanderlee and @alexander-nitsche, hi.
How do you feel about changing the function in 'phpSyllable-master/src/Cache/File.php' on line 22 from:

    public function setPath($path)
    {
        if ($path !== self::$path) {
            self::$path = $path;
            self::$data = null;
        }
    }

to:

    public function setPath($path)
    {
        if ($path !== __DIR__) {
            self::$path = __DIR__;
            self::$data = null;
        }
    }

??
Just a suggestion, not a pull request. It avoids the error that "self::$path" is not a legitimate string.

Caching appears to have been behind my server problems. It's fixed. So long as the path is correct, and caching is gone.

@tnbnicer
Copy link
Author

I can close the issue if you want or you can.

@alexander-nitsche
Copy link
Collaborator

@tnbnicer : Thanks for the details. So you would like to have two features:

  1. The syllable API should be able to consider an arbitrary amount of custom tex files along the main language tex file.
  2. The syllable API should be able to convert a simple user-defined array of custom hyphenations into a custom tex file. For example an array like
<?php
$myCustomLatinHyphenationsInAFrenchText = [
    'forum' => ['fo', 'rum'],
    'romanum' => ['ro', 'ma', 'num'],
];

@alexander-nitsche
Copy link
Collaborator

alexander-nitsche commented Apr 21, 2023

@tnbnicer : As for integrating the Syllable API into your own project: you should not modify any file of the Syllable package directly, e.g. do not add the cache paths directly into the \Vanderlee\Syllable\Cache\File::filename() method, but do it from outside the package where you instantiate the Syllable object and set the cache path via the cache object, such as.

<?php
$remote_ip = "35.214.1.93";
$server = (strpos(getenv( "SERVER_ADDR" ), $remote_ip) !== false) ? "remote" : "local";
$remote = ($server == "remote") ? true : false;

$syllable = new Syllable('nl');

if ($remote) {
  $syllable->getCache()->setPath('/home/customer/www/mydomain/apps/phpSyllable-masterTest/src/Cache');
} else {
  // do nothing and use default path
}

PHP packages are generally meant to be used by instantiating some package objects and executing their API, i.e. their public methods.

And you should not set the cache path anywhere inside the syllable package but outside, so rather choose something like

if ($remote) {
  $syllable->getCache()->setPath('/home/customer/www/mydomain/cache/phpSyllable-masterTest');
} else {
  // do nothing and use default path
}

and create that path manually or by PHP code (as shown above).

And you should not write the full path but make use of the adaptive __DIR__ or similar environment variable, that contains the path to the project root folder. For example, if your application PHP file resides in the project root /home/customer/www/mydomain/ on the server, you should use

if ($remote) {
  $syllable->getCache()->setPath(__DIR__.'/cache/phpSyllable-masterTest');
} else {
  // do nothing and use default path
}

This prevents revealing sensitive information and keeps your application flexible. You could probably use the same configuration on your local webserver too, removing the whole local/remote condition:

<?php
$syllable = new Syllable('nl');
$syllable->getCache()->setPath(__DIR__.'/cache/phpSyllable-masterTest');

@tnbnicer
Copy link
Author

tnbnicer commented Apr 21, 2023

@alexander-nitsche
I have to scoot off to do some real work in a moment, so this will be quick. I'm not sure I get the details of your proposals yet. This evening I will examine the ideas more closely. I'm going to do something unspeakably egocentric here and hopefully get away with it, if Github let's me, by posting a link to what I have online. It isn't meant to make money, only a demo. https://www.teanow5pm.co.uk/pid8.php.

Question: would it be possible for the API programmatically to integrate an existing dictionary, say 'en-us.tex', with a customer's uploaded vocabulary list, words to be hyphenated? Example:

Alakasandu
Alexander
Alexandria
Altkleidersammlung
ambivalently
amphitheater
Amsterdam
Amalthea
Amerikanistik
amethysts
Amtsgericht

The customer would also, or only, provide the intended hyphenations, dash separated syllables.

Alak-a-sandu
Alex-an-der
Alex-an-dria
Alt-klei-der-samm-lung
am-biv-a-lently
am-phi-the-ater
Am-ster-dam
Amal-thea
Ame-ri-ka-nis-tik
ame-thysts
Amts-ge-richt

See you later.

@tnbnicer
Copy link
Author

tnbnicer commented Apr 22, 2023

@alexander-nitsche
Sorry for the delay.

Hi, I use a similar array function for exceptions of more than one word, when context determines where the hyphen goes, or the word is shorter than the minimum length.

'Straße' => 'Stra-ße' will not be hyphenated if minimum word length is greater than 6. A longer string was necessary to catch it:

'Berliner Straße' => 'Ber-liner Stra-ße'

The word 'records' might be a verb or a noun, 're-cords' or 'rec-ords'. Array:

'he records' => 'he re-cords',
'many records => 'many rec-ords'

As to the issue at hand -- I'm still stumped by it, by the way -- why wouldn’t the server resolve: "self::$path"? An error. Caching? Writing/generating a new file with caching enabled is pretty useless. The error persists however with caching disabled.

Is it an issue or something that my server specifically fumbles at? I don't know. I agree that setting the cache path outside the syllable package is the best approach in general.

I apologize for the link above.

@alexander-nitsche
Copy link
Collaborator

Question: would it be possible for the API programmatically to integrate an existing dictionary, say 'en-us.tex', with a customer's uploaded vocabulary list, words to be hyphenated? Example:

This is a feature request and not supported currently. Would you create an according issue with description and example input and output?

@alexander-nitsche
Copy link
Collaborator

As to the issue at hand -- I'm still stumped by it, by the way -- why wouldn’t the server resolve: "self::$path"? An error. Caching? Writing/generating a new file with caching enabled is pretty useless. The error persists however with caching disabled.

Have you tried the example codes i posted above?

@tnbnicer
Copy link
Author

@alexander-nitsche
I made a little progress in that using your code "self::$path" resolves. It was a mistake defining $cache as a variable: $cache =

$cache = $syllable->getCache();
$cache->setPath($cache);

With $syllable->getCache()->setPath(__DIR__.'/cache'); (your code) there is no problem with "self::$path" in File.php.

@chmod($file, 0777), if I err not, gives PHP permission to overwrite, which it does not appear to be doing. You have to delete or rename the original *.json file before the new file is written. To force a rewrite I want to introduce an unlink($file); somewhere in the code. Written files would be immediately deleted. Another way to always rewrite is to give each *.json file a unique name.

$newfile = preg_replace('/.json/', '-'.time().'.json',$file);

I made progress as well on adding hyphenation files (exceptions) uploaded or pasted in a text field.

There are most likely various ways to accomplish it. I run LoadLanguage() twice, for the original *.tex file conversion, and again for the exceptions. There is a small error when you do this with 'hyph-de-1996.tex' that reads: Warning: Undefined variable $braces in F:\wamp - 303\www\phpSyllable-masterTest\src\Source\File.php on line 129. What causes it are too many percentage '%' and white space characters at the start of the *.tex file. They have to be deleted. French contains a lot of white space and percentage characters but runs smoothly. ...

Again I want to apologize for burdening you with a link.

I really appreciate your help. Your suggested solution was right on the button. I feel somewhat embarrassed, and yet happy it works. Thank you. !!

@tnbnicer
Copy link
Author

Update: hyph-de-1996.tex:

65. % ===========================================================================
66.
67. \message{German Hyphenation Patterns (Reformed Orthography, 2006) `dehyphn-x' 2021-02-26 (WL)}

Line 66. After I removed the line, the file compiled without a warning..

@tnbnicer
Copy link
Author

@alexander-nitsche, @vanderlee,
Issue #64 is ready to be marked as solved, in my view.

    public function close()
    {
        $file = $this->filename();
        file_put_contents($file, $this->encode(self::$data));
        @chmod($file, 0777);
    }

I can't say how -- the close() function doesn't run and no file is written, if the preceding open() function finds an existing version of the file, opens and reads it.

    public function open($language)
    {
        $language = strtolower($language);

        if (self::$language !== $language) {
            self::$language = $language;
            self::$data = null;

            $file = $this->filename();
            if (is_file($file)) {
                self::$data = $this->decode(file_get_contents($file));
            }
        }
    }

In other words, if open() opens a file, close() won't re-write a new one. (It took me awhile to figure out.) So the bottom line is it's not an issue, rather it's the intended behavior.

@alexander-nitsche

Would you create an according issue with description and example input and output?

My hosting provider suggests @chmod($file, 0644). 0755 for folders and 0644 for files should be ample, even for re-writing files. It's not an issue though.

@alexander-nitsche
Copy link
Collaborator

In other words, if open() opens a file, close() won't re-write a new one. (It took me awhile to figure out.) So the bottom line is it's not an issue, rather it's the intended behavior.

Yes, that seems to be correct. The only situations where a cache file is rewritten are (a) if it does not exist or (b) if the cache file read does not match the current cache version or cache file structure. So I am still assuming that your request for an additional set of hyphenation patterns has not yet been implemented and would best fit into a feature request.

@tnbnicer
Copy link
Author

@alexander-nitsche

(b) if the cache file read does not match the current cache version or cache file structure.

Are you sure? If the file is found to exist?

public function open($language) does not check for changes. It reads the existing file, assuming it exists: if(is_file() ... .

There is a slight danger involved in rewriting. Online two users might be accessing the same file. If one user chooses a different setting to the other, what would happen?

This would be a good time to admit that I made changes to the API that allow a user to personalize the 'minimum hyphen' values. A change to "left minimum", "right minimum" on the client. It works, but since the values are currently read from a cached version of 'syllable.[$language].json' on the server, the updated setting is lost.

Note the fix requires changes to the supported version of the API on Github. As a requested feature, it changes too much to be practicable. Otherwise I would request a feature.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants