Java开发网 Java开发网
注册 | 登录 | 帮助 | 搜索 | 排行榜 | 发帖统计  


» Java开发网 » 高性能Java数据库缓存  

按打印兼容模式打印这个话题 打印话题    把这个话题寄给朋友 寄给朋友    该主题的所有更新都将Email到你的邮箱 订阅主题
reply to topicflat modethreaded modego to previous topicgo to next topicgo to back
作者 我的缓存思路和源代码(Java版的,含测试JSP)


发贴: 20
于 2008-07-24 13:48 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list











select id from T where topicId=2008 order by createTime desc limit 0,5
select id from T where topicId=2008 order by createTime desc limit 5,5
select id from T where topicId=2008 order by createTime desc limit 10,5

select id from T order by createTime desc limit 0,50

select id from T where userId=2046 order by createTime desc limit 0,15
select id from T where userId=2046 order by createTime desc limit 15,15
select id from T where userId=2046 order by createTime desc limit 30,15

select id from T where topicId=2008 and userId=2046 order by createTime desc limit 0,15
select id from T where topicId=2008 and userId=2046 order by createTime desc limit 15,15

总结:这种缓存思路可以存储大规模的列表,缓存命中率极高,因此可以承受超大规模的应用,但是需要技术人员根据自身业务逻辑来配置需要做散列的字段,一般用一个表的索引键做散列(注意顺序,最散的字段放前面),假设以userId为例,可以存储N个用户的M种列表,如果某个用户的相关数据发生变化,其余N-1个用户的列表缓存纹丝不动。以上说明的都是如何缓存列表,缓存长度和缓存列表思路完全一样,如缓存象select count( * ) from T where topicId=2008这样的长度,也是放到topicId=2008这个散列Map中。如果再配合好使用mysql的内存表和memcached,加上F5设备做分布式负载均衡,该系统对付像1000万IP/天这种规模级的应用都足够了,除搜索引擎外一般的应用网站到不了这种规模。





最后啰嗦一句,如果大家真想支持我、支持中国人开源项目,请把该文贴到自己的博客中或者收藏本文,记得包含文档的下载地址!!!!!!!谢谢。thank you and Good luck。



缓存思路.doc (61.5k)

liuaike edited on 2008-07-24 13:50

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

发贴: 1
于 2008-07-25 12:16 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:ababcc]


发贴: 20
于 2008-07-26 11:06 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
ababcc wrote:



作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

发贴: 143
于 2008-07-26 11:56 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list

Breeze edited on 2008-07-26 11:59

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 509
于 2008-07-26 16:23 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list


在多核 CPU + linux 2.6 + jdk 1.6 的环境下,优化比较好的web应用,以一个分配2G内存的虚拟机来算,实际环境下,大约每天可承受500万的pv.

bluepure edited on 2008-07-26 16:27

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 1594
于 2008-07-27 16:40 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list

"First they ignore u, then they laugh at u, then they fight u, then u will win

Mahatma Gandhi"

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

Java Jedi


发贴: 3233
于 2008-07-29 02:17 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
No intention to throw flares. Just a discussion:
The central point of this post is to use a composite key(user id + range) to increase the hit rate. I don't see the discussion on 分布式解决方案.

If I understand the distributed cache correctly, the following are the key points on a distributed cache solution:
1. clusted: all different JVMs carry the same content, replicated.
2. fault tolerant: >1 copies in the cache. Normally this feature and #1 are implemented as one, i.e., users can specify how many copies they want in the entire cache.
3. scalable: meaning, increase machines/JVMs to expand memory, not copies. Whether this is transparent to the users, i.e., whether users need to change code for new cache servers.
4. network protocols: UDP/TCP
5. transactions
6. backup to files/databases asynchronously.
7. API for other languages, JDBC/ODBC drivers
8. SQL language manipulations.
9. near cache/far cache memory management.
10. locking
11. distributed event handling, such as JMS. This is essential for intersystem updates.

Another term is data grid.

Another thought is that a cache component should be a reusable component. On the user API side, there should be only a handful methods, such as get/set, getAllkeys, lock, unlock, etc. On the config side, network, backup copies, memory size, etc.

From users' perspective, a cache should be just a large memory chunk, doesn't matter where it sits.

The composite key idea has been around for more than 15 years. The credit card number has 19 digits internally, they have to use this technique to decompose the huge table to many small tables so that they can quickly locate your record in millis. It's working well.

In my experience, the best solution so far, based on the above conditions, is still Tangosol, of course, this doesn't mean it has all the listed features, but it's close enough, some of the features can be added from outside. I've been using >30 fields composite key to locate data. From what I heard, this has been the case in most big financial firms since the data is huge - size does matter and it gets ugly very quickly.

Just some of experience to share.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 20
于 2008-07-29 11:50 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]



发贴: 857
于 2008-07-29 17:27 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list

memcached也不是一个完整意义上的分布式缓存系统,因为你知道,memcached server节点之间是不会相互复制数据的。仅仅是,这个server失效了,我还可以用另一个,但数据就需要去db或者其他地方重新获取了。

楼主用Runtime hook的方式保存没有同步到数据的数据,是非常危险的,如果server异常退出,数据就会丢失,这在重要系统中是不能容忍的。




作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 20
于 2008-07-29 18:57 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
楼上说的很对,对于像wes109 和floater这样看了系统并且提出的建议的我是非常接受的。的确,该系统用来做安全级别比较高的应用是不够的。一般来说,安全级别比较高的应用也不必要用缓存,像银行系统,几乎没有公共显示区,也没有太多公共列表,没什么好缓存的,直接用数据库的事务处理会更安全。

floater 说的11点,我再仔细看了看,我觉得奇怪的是能看懂我的文章为什么用英文回复。









第九:cache 内存管理,没太明白。





作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 1594
于 2008-07-29 21:07 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
还是float 老大专业,呵呵,把 这11项搞清楚,才可以说:

另外 ,ejb 不行,sun 就别混了 ,呵呵

其实 这种东西 非常又实用价值,但不是很简单,希望楼主,一定要深入研究

jameszhang edited on 2008-07-29 21:14

"First they ignore u, then they laugh at u, then they fight u, then u will win

Mahatma Gandhi"

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 20
于 2008-07-30 09:30 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list


作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]



发贴: 857
于 2008-07-30 10:11 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list



作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:wes109]


发贴: 1594
于 2008-07-30 19:12 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
wes109 wrote:



我说的系统主要是构建在hibernate之上的高效数据库缓存系统,其中包含了分布式解决方案,该系统已经应用在舍得网上了 .......

不然 floater 也不会跟他说那么多啊!英文也不好敲啊!呵呵
另外说是 数据库缓存系统 连事务都不支持?还是叫做 数据缓存 为好

jameszhang edited on 2008-07-30 19:16

"First they ignore u, then they laugh at u, then they fight u, then u will win

Mahatma Gandhi"

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

Java Jedi


发贴: 3233
于 2008-07-31 02:28 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
was kind of in a rush at work, so just threw some highlights there.

Try to expand a little bit.
Problem: to improve performance, reduce the bottleneck around databases.
Open source solutions:
JCS, Terracotta, OSCache, EHCache, WhirlyCache, JCache, SwarmCache, JBoss Cache, memcached, etc.

Commercial solutions: Tangosol, Gigaspaces, etc
Other candidates: in memory database, sybase RAP, but is not distributed.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

Java Jedi


发贴: 3233
于 2008-07-31 03:32 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
There are two aspects on this subjects:
1. being distributed: across the JVM boundary to form a larger, unified system.
2. data management: how to manage the data across the JVM bounday.
Each of these is already complicated enough. When combining them together, it could go only worse.

I am trying to expand more, with either a concern or a requirement, to provide some background explanation.

The 11 points that I posted are not my imagination, or copied from textbooks. But they are from my experience at work, reading on the internet, tryout of some APIs, requirements from work, etc. I think there are more points, but they are beyond my experience and expertise. If you want to go beyond these points, you need to consult more experienced, specialized engineers, such as folks working at Tangosol.

1. clustered: another term is data coherence. We need to duplicate data across JVMs with n copies, where n can be specified by end users. Otherwise we can't sustain a single JVM failure. JBoss, etc is using JGroup to do this. The tricky part is that while we duplicate data, we need to minimize the network traffic, otherwise the performance is terrible. Terracotta is sending delta(on the change, either new or updated) across the network. I tested with Tangosol before, its performance is great(xMB/second, x is confidential).

2. scalable: adding new machines to expand the total memory of the entire cache. All the open source solutions do not have this feature. Another term associated with this is data grid. The idea is that we have so many cheap machines and if we can hook them together, we can form a huge usable fast memory cache(100 times faster than disks). The hard part to implement this is how to determine which machine to store values. Normally they come up some kind of algorithm taking the keys to hash to a different key to decide which machine to go. This is where the cache providers get paid big time. I know a lot of companies are using hundreds of GB, and they are still growing. TB is not something is the far futures, likely to be next year.

3. It would be better to support both UDP/TCP because UDP is easier when adding new machines, but sometimes UDP is just not available due to company policies. So it would be better to use TCP for testing and UDP for production.
Commercial solutions always have both. JBoss is using JGroup and thus have both too, but others are lazy to include TCP.

4. Transactions is very very crucial because of the data integrity. However, this is related to the data granularity and how atomic the set/get methods are. Global transaction(such as JDBC + JMS) is another reason for this.

5. back up to the second storage: most of the cache implementations have this feature. My reason for this is that I need to maintain certain number of days of data because of the business requirements. They could be regulatory, back testing, etc. Back up could be asynchronous, multi-threaded.

6. Some of the cache implementations can have ODBC/JDBC drivers, and they can be treated as a relational cache, meaning using sql language to search through the value objects. These caches are termed data stores. One simply way to implement them is to use javacc write a sql parser and treat tables/columns as java objects/attributes and then run this parser across the entire cache. Since everything is in memory, a trivial implementation is not slow, though could be faster. memcached has a lot of apis for different languages. All commercial solutions have java/.net/c interfaces. Since a lot of financial applications have .net/excel interface and java backend.

7. The near/far cache management: JBoss has the TcpCacheLoader class. My scenario is that I have some memory hungry process, by itselt it will take nearly all the JVM memory, so I can't spend extra on cache in the same thread. I want to throw my results to the external cache, external to my current process. Most of the implementations will take some memory from the current process and join the cache cluster. They don't have the option to externalize the entire cache. Tangosol has an option to specify how much memory you want to give to the near cache(the cache setting in the current process). This is a vital feature in the distributed environment because most of the time when we go to distributed is due to either short of memory or long running time(in this case we go grid computing), and in either case, we need to collect the result back to a central location, either a database, or a cache.

8. Lock is the minimal requirement for concurrent access. In a distributed environment, we need more than a lock, much more than that because we have not only concurrent access, but also distributed access. So we will have deadlock, starvation, etc.

9. distributed event handling through JMS. It's kind of surprise that no implementation exception 1 does this. I encounter four times in the last two years where JMS + cache would solve a hell of problems, and yet kind of ironic. This is not a new idea. Rod Johnson implemented in his first book source code, just copy and paste. But this feature is just as powerful as its nondistributed version. However, this can be added outside caches as add-ons. But a built-in would be a timesaver.

Tangosol has all but last one. And the performance is good in my view. Another one is gigaspaces, but I don't like their salefolks and talks. I talked to both companies sale folks before.

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

Java Jedi


发贴: 3233
于 2008-07-31 03:53 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
I can't type in chinese, this is already asked thousands of times. I can read chinese, both simplified and traditional. But constantly pronounce them wrong, and my wife laughs at me(and my daughter laughs at my english too, what am I doing? Smile).

EJB and hibernate are just using cache at most, they have nothing to do with cache implementation.

I happened to work in a distributed environment for the last 3 years, accumulated some experience. But the internet is still a greater place to learn. There are so many nice folks, including folks in here, who are willing to share the knowledge and experience. Did you cuil today? Smile

Keep it simple, but not simpler - Albert Einstein

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 20
于 2008-07-31 10:04 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 322
于 2008-08-01 10:00 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
很好,我现在维护的站点就是每天超过2,3百万的pv,什么缓存框架也没用,最土的方法----load数据到hashmap,3台前置机,1台hp6850,2台del,其中有一步骤需要多表连查,最后被我们公司的牛人扩展了一下hashmap的java实现,将多表连查的数据写成文件,每次都读取文件,,,现在有一点让我头痛的就是那两台dell前置机的loadbalance 老是下不来,一直在1以上,cpu占用率也不高于90%,hp那台就好点,,不过每次pv都有3个 socket操作,,我在想是不是由于这个造成的,顺便问问各位有没有寻找java/jsp下的机器loadbalance 下不来的经验,给俺介绍一下

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

Java Jedi


发贴: 3233
于 2008-08-03 02:58 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
maybe slow network?

maybe hashcode not balanced?

"Any fool can write code that a computer can understand. Good programmers write code that humans can understand."
- Martin Fowler, Refactoring - Improving the Design of Existing Code
作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:floater]


发贴: 322
于 2008-08-04 10:43 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
floater wrote:
maybe slow network?

maybe hashcode not balanced?

令我头痛的事,,下边就是top后Shift H 后的结果,,debian linux,

top - 10:25:10 up 310 days, 14:39, 1 user, load average: 1.61, 2.01, 1.96
Tasks: 851 total, 3 running, 847 sleeping, 0 stopped, 1 zombie
CpuMoon: 11.4%us, 0.6%sy, 0.0%ni, 87.9%id, 0.0%wa, 0.0%hi, 0.1%si, 0.0%st
Mem: 16433220k total, 12911932k used, 3521288k free, 158728k buffers
Swap: 1951888k total, 48k used, 1951840k free, 4562588k cached

18996 sms 18 0 3735m 2.0g 46m S 53 12.5 8383:25 java
3144 sms 15 0 130m 13m 796 S 7 0.1 0:02.21 httpd
28960 sms 15 0 245m 33m 812 S 6 0.2 1:16.81 httpd
30492 sms 15 0 1023m 165m 1396 S 5 1.0 9:19.39 httpd
16878 sms 15 0 476m 73m 1392 S 5 0.5 3:42.12 httpd
23049 sms 15 0 344m 50m 1396 R 5 0.3 2:20.75 httpd
1843 sms 15 0 152m 17m 812 S 5 0.1 0:16.52 httpd
22690 sms 15 0 353m 52m 1392 S 5 0.3 2:25.94 httpd
23261 sms 15 0 336m 49m 1396 S 5 0.3 2:15.12 httpd
2564 sms 15 0 138m 15m 800 S 4 0.1 0:07.50 httpd
18610 sms 15 0 437m 66m 1396 S 4 0.4 3:17.87 httpd
22686 sms 15 0 350m 52m 1396 S 4 0.3 2:23.86 httpd
25431 sms 15 0 307m 44m 1396 S 4 0.3 1:57.44 httpd
26398 sms 15 0 286m 40m 812 S 4 0.3 1:43.56 httpd
28723 sms 15 0 1063m 172m 1396 S 4 1.1 9:45.60 httpd
3354 sms 18 0 127m 13m 768 S 4 0.1 0:00.12 httpd
12019 sms 15 0 669m 105m 1396 S 4 0.7 5:37.15 httpd
25364 sms 15 0 300m 43m 1392 S 4 0.3 1:51.99 httpd
31076 sms 15 0 203m 27m 1392 S 4 0.2 0:49.51 httpd
3205 sms 15 0 3735m 2.0g 46m S 3 12.5 0:00.18 java
3256 sms 15 0 0 0 0 Z 3 0.0 0:00.84 httpd <defunct>
18519 sms 15 0 430m 65m 1396 S 3 0.4 3:13.08 httpd
18805 sms 15 0 420m 63m 1396 S 3 0.4 3:07.44 httpd
20240 sms 15 0 404m 61m 1392 S 3 0.4 2:58.74 httpd
2806 sms 15 0 134m 14m 808 S 2 0.1 0:04.72 httpd
2662 sms 15 0 3735m 2.0g 46m S 2 12.5 0:01.04 java
3222 sms 15 0 3735m 2.0g 46m S 2 12.5 0:00.19 java
15854 sms 16 0 521m 80m 1396 S 2 0.5 4:08.46 httpd
18518 sms 15 0 440m 67m 1396 S 2 0.4 3:19.58 httpd
22687 sms 15 0 346m 51m 1396 S 2 0.3 2:21.96 httpd
25679 sms 15 0 289m 41m 812 S 2 0.3 1:45.03 httpd
1700 sms 15 0 152m 18m 1392 S 2 0.1 0:16.16 httpd
2619 sms 15 0 137m 15m 800 S 2 0.1 0:06.47 httpd
1512 sms 15 0 3735m 2.0g 46m S 2 12.5 0:02.07 java
2692 sms 15 0 3735m 2.0g 46m S 2 12.5 0:00.88 java
26200 sms 15 0 287m 41m 1396 R 2 0.3 1:44.63 httpd
2929 sms 16 0 3735m 2.0g 46m S 1 12.5 0:00.52 java
3000 sms 15 0 3735m 2.0g 46m S 1 12.5 0:00.41 java
3219 sms 15 0 3735m 2.0g 46m S 1 12.5 0:00.08 java
3266 sms 15 0 3735m 2.0g 46m S 1 12.5 0:00.10 java
3198 sms 15 0 11256 1936 952 R 1 0.0 0:01.11 top
2296 sms 15 0 3735m 2.0g 46m S 1 12.5 0:01.47 java
2322 sms 15 0 3735m 2.0g 46m S 1 12.5 0:01.33 java
2334 sms 15 0 3735m 2.0g 46m S 1 12.5 0:01.19 java

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]


发贴: 20
于 2008-08-07 17:24 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list
舍得网新版上线,使用修改后的缓存系统,尤其加上锁和长度缓存后,速度杠杠的。mysql的load average也比较低,平均0.2左右。

作者 Re:我的缓存思路和源代码(Java版的,含测试JSP) [Re:liuaike]

发贴: 5
于 2009-03-04 15:02 user profilesend a private message to userreply to postsearch all posts byselect and copy to clipboard. 
ie only, sorry for netscape users:-)add this post to my favorite list


reply to topicflat modethreaded modego to previous topicgo to next topicgo to back
Jump to the top of page

   Powered by Jute Powerful Forum® Version Jute 1.5.6 Ent
Copyright © 2002-2021 Cjsdn Team. All Righits Reserved. 闽ICP备05005120号-1
客服电话 18559299278    客服信箱    客服QQ 714923