automattic/php-thrift-sql - Packagist


本站和网页 https://packagist.org/packages/automattic/php-thrift-sql 的作者无关,不对其内容负责。快照谨为网络故障时之索引,不代表被搜索网站的即时页面。

automattic/php-thrift-sql - Packagist
Toggle navigation
Packagist The PHP Package Repository
Browse
Submit
Create account
Sign in
Remember me
Use Github
Log in
No account yet? Create one now!
Search by
automattic / php-thrift-sql
A PHP library for connecting to Hive or Impala over Thrift
Maintainers
Details
github.com/Automattic/php-thrift-sql
Homepage
Source
Issues
Installs:
50 307
Dependents:
Suggesters:
Security:
Stars:
111
Watchers:
126
Forks:
41
Open Issues:
v0.3.1
2019-07-31 15:57 UTC
Requires
php: >=5.5.0
Requires (Dev)
theseer/autoload: 1.*
Suggests
None
Provides
None
Conflicts
None
Replaces
None
GPL-2.0-or-later c2d8d83260b8227a5753fdc2d96047676f1b3a20
Xiao Yu <xyu.woop@automattic.com>
databasesqlthrifthiveImpala
dev-master
v0.3.1
v0.3.0
v0.2.1
v0.2.0
v0.1.3
v0.1.2
This package is auto-updated.
Last update: 2022-12-20 14:31:38 UTC
README
The ThriftSQL.phar archive aims to provide access to SQL-on-Hadoop frameworks for PHP. It bundles Thrift and various service packages together and exposes a common interface for running queries over the various frameworks.
Currently the following engines are supported:
Hive -- Over the HiveServer2 Thrift interface, SASL is enabled by default so username and password must be provided however this can be turned off with the setSasl() method before calling connect().
Impala -- Over the Impala Service Thrift interface which extends the Beeswax protocol.
Version Compatibility
This library is currently compiled against the Thrift definitions of the following database versions:
Apache Hive 1.1.0 (Mar 2015)
Apache Impala 2.12.0 (Apr 2018)
Using the compiler and base PHP classes of:
Apache Thrift 0.12.0 (Oct 2018)
Usage Example
The recommended way to use this library is to get results from Hive/Impala via the memory efficient iterator which will keep the connection open and scroll through the results a couple rows at a time. This allows the processing of large result datasets one record at a time minimizing PHP's memory consumption.
// Load this lib
require_once __DIR__ . '/ThriftSQL.phar';
// Try out a Hive query via iterator object
$hive = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$hiveTables = $hive
->connect()
->getIterator( 'SHOW TABLES' );
// Try out an Impala query via iterator object
$impala = new \ThriftSQL\Impala( 'impala.host.local' );
$impalaTables = $impala
->connect()
->setOption( 'MEM_LIMIT', '2gb' ) // optionally set some query options
->getIterator( 'SHOW TABLES' );
// Execute the Hive query and iterate over the result set
foreach( $hiveTables as $rowNum => $row ) {
print_r( $row );
// Execute the Impala query and iterate over the result set
foreach( $impalaTables as $rowNum => $row ) {
print_r( $row );
// Don't forget to close socket connection once you're done with it
$hive->disconnect();
$impala->disconnect();
The downside to using the memory efficient iterator is that we can only iterate over the result set once. If a second foreach is called on the same iterator object an exception is thrown by default to prevent the same query from executing on Hive/Impala again as results are not cached within the PHP client. This can be turned off however be aware iterating over the same iterator object may produce different results as the query is rerun.
Consider the following example:
// Connect to hive and get a rerun-able iterator
$hive = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$results = $hive
->connect()
->getIterator( 'SELECT UNIX_TIMESTAMP()' )
->allowRerun( true );
// Execute the Hive query and get results
foreach( $results as $rowNum => $row ) {
echo "Hive server time is: {$v[0]}\n";
sleep(3);
// Execute the Hive query a second time
foreach( $results as $rowNum => $row ) {
echo "Hive server time is: {$v[0]}\n";
Which will output something like:
Hive server time is: 1517875200
Hive server time is: 1517875203
If the result set is small and it would be easier to load all of it into PHP memory the queryAndFetchAll() method can be used which will return a plain numeric multidimensional array of the full result set.
// Try out a small Hive query
$hive = new \ThriftSQL\Hive( 'hive.host.local', 10000, 'user', 'pass' );
$hiveTables = $hive
->connect()
->queryAndFetchAll( 'SHOW TABLES' );
$hive->disconnect();
// Print out the cached results
print_r( $hiveTables );
// Try out a small Impala query
$impala = new \ThriftSQL\Impala( 'impala.host.local' );
$impalaTables = $impala
->connect()
->queryAndFetchAll( 'SHOW TABLES' );
$impala->disconnect();
// Print out the cached results
print_r( $impalaTables );
Developing & Contributing
In order to rebuild this library you will need Composer to install dev dependencies and Apache Thrift to compile client libraries from the Thrift interface definition files.
Once dev tools are installed, make sure you get all git submodules:
$ git submodule init
And then the phar can be rebuilt using make:
$ make clean && make phar
NOTE: If you get a BadMethodCallException, it may come from any of the reasons mentioned in the PHP doc, or even a low soft limit on open file descriptors since Phar::compressfiles keeps all files opened until it writes the compressed phar.
About Packagist
Atom/RSS Feeds
Statistics
Browse Packages
API
Mirrors
Status
Packagist maintenance and hosting is provided by Private Packagist