apache hive


本站和网页 https://hive.apache.org/#Getting+Started 的作者无关,不对其内容负责。快照谨为网络故障时之索引,不代表被搜索网站的即时页面。

Apache Hive Release Hcatalog
Documentation
Javadocs
Language Manual
Wiki
General
License
Privacy Policy
Development
Getting Started
Quickstart with Docker
Design Docs
Hive JIRA
Hive Developer FAQ
Precommit Patch Testing
Version Control
Community
Becoming A Committer
How To Contribute
Resources for Contributors
Mailing Lists
Issue Tracking
People
By Laws
How To Release
Blogs
ASF
Donations
Sponsorship
Thanks
Website
The Apache Hive ™ is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale and
facilitates reading, writing, and managing petabytes of data residing in distributed storage using SQL.
Github
Mail
Docker
What is Hive?
Apache Hive is a distributed, fault-tolerant data warehouse system that enables analytics at a massive scale.
Hive Metastore(HMS) provides a central repository of metadata that can easily be analyzed to make informed,
data driven decisions, and therefore it is a critical component of many data lake architectures.
Hive is built on top of Apache Hadoop and supports storage on S3, adls, gs etc though hdfs.
Hive allows users to read, write, and manage petabytes of data using SQL.
Key Features
beeline -u "jdbc:hive2://host:10001/default"
Connected to: Apache Hive
jdbc:hive2://host:10001/>select count(*) from test_t1;
Hive-Server 2 (HS2)
HS2 supports multi-client concurrency and authentication.
It is designed to provide better support for open API clients like JDBC and ODBC.
Learn More
Hive Metastore Server (HMS)
The Hive Metastore (HMS) is a central repository of metadata for Hive tables and partitions in a relational database,
and provides clients (including Hive, Impala and Spark) access to this information using the metastore service API.
It has become a building block for data lakes that utilize the diverse world of open-source software, such as Apache Spark and Presto.
In fact, a whole ecosystem of tools, open-source and otherwise, are built around the Hive Metastore, some of which this diagram illustrates.
Hive ACID
Hive provides full ACID support for
ORC
tables and insert only support to all other formats.
Hive Data Compaction
Query-based and MR-based data compactions are supported out-of-the-box.
jdbc:hive2://> alter table test_t1 compact "MAJOR";
Done!
jdbc:hive2://> alter table test_t1 compact "MINOR";
jdbc:hive2://> show compactions;
Hive Iceberg
Hive provides out of the box support for Apache Iceberg Tables, a cloud-native,
high-performance open table format, via Hive StorageHandler.
Security and Observability
Apache Hive supports kerberos auth and integrates with Apache Ranger and Apache Atlas for security and
observability.
Hive LLAP
Apache Hive enables interactive and subsecond SQL through Low Latency Analytical Processing (LLAP),
introduced in Hive 2.0 that makes Hive faster by using persistent query infrastructure and optimized data caching
Query planner and Cost based Optimizer
Hive uses Apache Calcite's cost based query optimizer (CBO) and query execution framework to optimize sql queries.
jdbc:hive2://> explain cbo select ss.ss_net_profit, sr.sr_net_loss from store_sales ss join store_returns sr on (ss.ss_item_sk=sr.sr_item_sk) limit 5 ;
+---------------------------------------------+
Explain
CBO PLAN:
HiveSortLimit(fetch=[5])
HiveProject(ss_net_profit=[$1], sr_net_loss=[$3])
  HiveJoin(condition=[=($0, $2)], joinType=[inner])
    HiveProject(ss_item_sk=[$2], ss_net_profit=[$22])
    HiveFilter(condition=[IS NOT NULL($2)])
      HiveTableScan(table=[[tpcds_text_10, store_sales]], table:alias=[ss])
    HiveProject(sr_item_sk=[$2], sr_net_loss=[$19])
      HiveTableScan(table=[[tpcds_text_10, store_returns]], table:alias=[sr])
jdbc:hive2://> repl dump src with (
. . .> 'hive.repl.dump.version'= '2',
. . .> 'hive.repl.rootdir'= 'hdfs://<host>:<port>/user/replDir/d1'
. . .> );
jdbc:hive2://> repl load src into tgt with (
Hive Replication
Hive supports bootstap and incremental replication for backup and recovery.
Apache is a non-profit organization helping open-source
software projects released under the Apache
license
and managed with
open governance
and
privacy policy
. See upcoming
Apache Events
If you discover any
security
vulnerabilities, please
report them privately. Finally,
thanks
to the sponsors who
donate
to the Apache Foundation.
The contents of this website are © 2023 Apache Software Foundation under the terms of the Apache License v2. Apache Hive and its logo are trademarks of the Apache Software Foundation.